Get desktop application:
View/edit binary Protocol Buffers messages
Used in: ,
Used in: ,
probs threshold
top k accuracy when num_class > 1
Used in: ,
Used in: ,
Used in: ,
Used in: ,
Used in: , , , , , , ,
number of embedding channels
temperature coefficient for softmax
Used in:
max number of high capsules Default: 5
max behaviour sequence length
high capsule embedding vector dimension
dynamic routing iterations, Default: 3
routing logits scale Default: 20
routing logits initial stddev Default: 1
squash power Default: 1
whether to use constant capsule number, Default: false
the initialization method for routing logits, Default: normal, available: zeros.
Used in:
task name for the task tower
label for the task, default is label_fields by order
metrics for the task
log train merics for task
loss for the task
num_class for multi-class classification loss
task specific mlp
training loss weights
related tower names
relation mlp
sample weight for the task
label name for indicating the sample space for the task tower
the loss weight for sample in the task space
the loss weight for sample out the task space
use pareto front minimal loss weight, ge 0 and lt 1
Used in:
Used in:
Used in: ,
feature name.
feature input, e.g. item:item_id
embedding name, feature with same embedding name will share embedding
embedding dimension
number of hash size
number of id enumerators
id vocabulary list
id vocabulary dict
id value dimensions, default = 0, when use in seq, default = 1 if value_dim = 0, it supports id with multi-value
embedding pooling type, available is {sum | mean}
fg default value, default value before bucktize
fg multi-value separator
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
mask value in training progress
zero collision hash
id vocabulary file path
vocab file relative directory
dynamic embedding
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
out-of-vocab(OOV) id bucketize value when use vocab_list or vocab_dict when use default_bucketize_value, we will not add additional bucketize_value of `default_value`=0, bucketize_value of <OOV>=1 into vocab_list or vocab_dict
value_type after fg before bucketize, you can specify it for better performance. e.g. fg_value_type = int64 when use num_buckets
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in:
every layer size
Used in: ,
feature name, e.g. tag_feat
feature input, e.g. user:tag
embedding name, feature with same embedding name will share embedding
embedding dimension
value map for mapping input string values to float values
boundaries for bucktize numeric combine value
number of id enumerators for sparse combine value
embedding pooling type, available is {sum | mean}. Controls embedding bag aggregation. NOTE: distinct from 'combiner' which controls FG-level multi-value aggregation.
fg default value, default value before bucktize
fg multi-value separator
fg normalizer, e.g. method=log10,threshold=1e-10,default=-10 method=zscore,mean=0.0,standard_deviation=10.0 method=minmax,min=2.1,max=2.2 method=expression,expr=sign(x)
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
mask value in training progress
combine value combiner type, available is {sum | mean | min | max} Controls FG-level multi-value aggregation before bucketization.
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
autodis embedding
mlp embedding
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in: ,
feature name, e.g. os_and_cate
feature input, e.g. [user:os, item:cate]
embedding name, feature with same embedding name will share embedding
embedding dimension
number of hash size
id vocabulary list
id vocabulary dict
id value dimensions, if value_dim = 0, it supports id with multi-value
embedding pooling type, available is {sum | mean}
fg default value
fg multi-value separator
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
mask value in training progress
zero collision hash
id vocabulary file path
vocab file relative directory
dynamic embedding
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
out-of-vocab(OOV) id bucketize value when use vocab_list or vocab_dict when use default_bucketize_value, we will not add additional bucketize_value of `default_value`=0, bucketize_value of <OOV>=1 into vocab_list or vocab_dict
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in: , ,
(message has no fields)
Used in: , ,
total number of steps or epochs for cosine annealing
minimum learning rate
warmup start learning rate
warmup steps or epochs
schedule by epoch or by step.
Used in: , ,
number of steps or epochs for the first cosine annealing period
factor to grow period length after each restart (1 = fixed period)
minimum learning rate
warmup start learning rate
warmup steps or epochs
schedule by epoch or by step.
Used in:
number of cross layers
Used in:
number of cross layers
Matrix decomposition with minimal rank.
Used in: ,
feature name.
custom operator name.
custom operator lib file name.
operator custom params.
custom operator is thread safe or not.
feature input, e.g. user:os
embedding name, feature with same embedding name will share embedding
embedding dimension
boundaries for bucktize numeric value
number of hash size for sparse value
number of id enumerators for sparse value
id vocabulary list for sparse value
id vocabulary dict
embedding pooling type, available is {sum | mean}
fg default value
fg multi-value separator
fg normalizer, e.g. method=log10,threshold=1e-10,default=-10 method=zscore,mean=0.0,standard_deviation=10.0 method=minmax,min=2.1,max=2.2 method=expression,expr=sign(x)
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
value dimensions
mask value in training progress
zero collision hash
id vocabulary file path
vocab file relative directory
dynamic embedding
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
out-of-vocab(OOV) id bucketize value when use vocab_list or vocab_dict when use default_bucketize_value, we will not add additional bucketize_value of `default_value`=0, bucketize_value of <OOV>=1 into vocab_list or vocab_dict
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
autodis embedding
mlp embedding
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in:
user and item tower output dimension
similarity method
similarity scaling factor
use in batch items as negative items.
loss weight for amm_i
loss weight for amm_u
Used in:
input feature group name
augmented feature group name
mlp config
Used in:
shared bottom MaskNet module
shared bottom mlp layer
mmoe expert mlp layer definition
mmoe gate module definition
number of mmoe experts
bayes task tower
Used in:
shared bottom mlp layer
mmoe expert mlp layer definition
mmoe gate module definition
number of mmoe experts
task tower
Used in:
Used in:
Used in:
seq encoder name
sequence feature name
mlp config for target attention score
maximum sequence length
Used in:
input feature group name
mlp config for target attention score
Used in:
if has dense feature group,must has dense_mlp
whether to include sparse features after interaction
Used in:
user and item tower output dimension
similarity method
similarity scaling factor
use in batch items as negative items.
Used in:
user and item tower output dimension
similarity method
similarity scaling factor
use in batch items as negative items.
Used in:
mini batch size to use for training and evaluation.
dataset type.
[deprecated] please use fg_mode. input data is feature generate encoded or not. if fg_encoded = true, you should do fg offline first, and set fg_encoded_multival_sep for split multi-val feature
separator for multi-val feature in fg encoded input data
labels
number of workers for parallel processing raw data
pin memory for fast cudaMemCopy
the input fields must be the same number and in the same order as data in csv files
delimiter of column features, only used for CsvDataset
for csv files, with header or not.
mini batch size to use for and evaluation.
drop last batch less than batch_size
fg threads for each worker, if fg_threads = 0, will disable fg dag handler, use python run.
when use OdpsDataset, read data orderby table partitions or not.
maxcompute storage api & tunnel quota name
mask probability for samples in training progress
mask probability for sampled negatives in training progress
force padding data into same data group with same batch_size
sample weights
fg run mode.
whether to shuffle data
shufffle buffer for better performance, even shuffle buffer is set, it is suggested to do full data shuffle before training especially when the performance of models is not good.
maxcompute storage api data compression type, LZ4_FRAME | ZSTD | UNCOMPRESSED
sample cost field name
batch cost limit size
simplified input fields string format: input_name1:input_type1;input_name2:input_type2; type names follow ODPS conventions with aliases: BIGINT->INT64, INT->INT32
negative sampler
Used in:
Used in:
wide embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
Used in:
MC/ZCH features are not supported; use dynamicemb for delta dump. dump touched ids and their latest embedding every N training steps. Larger intervals retain a longer id window in memory; auto compaction reduces per-batch tensor buildup but unique ids still scale with the interval.
output directory. default is ${model_dir}/delta_embedding_dump
parquet file prefix
Used in:
DistanceLFU: evict_score = access_cnt / pow((current_iter - last_access_iter), decay_exponent)
Used in:
decay rate is access step
Used in:
hstu config
multi task tower config
max sequence length
item embedding mlp hidden dimension
enables loss averaging computation globally across all ranks (total rank) instead of locally (local rank).
timestamp of sequence is ascending or descending
concat all contextual features on channel dim as one token
Used in:
minimum frequency threshold for admission
determine how to initialize the embedding if the key is not admitted.
kv counter capacity, if not set, use embedding max_capacity
kv counter capacity for each bucket.
Used in: ,
the mode of initialization. NORMAL | TRUNCATED_NORMAL | UNIFORM | CONSTANT
the mean value for (truncated) normal distributions
the standard deviation for (truncated) normal distributions. default is sqrt(1 / embedding_dim)
the lower bound for uniform/truncated_normal distribution. default is -sqrt(1 / max_capacity)
the upper bound for uniform/truncated_normal distribution. default is sqrt(1 / max_capacity)
the constant value for constant initialization.
Used in: , , , , ,
arguments for initializing dynamic embedding vector values. default is uniform distribution, and absolute values of upper and lower bound are sqrt(1 / embedding_dim).
the initializer args for evaluation mode. default is default is constant initialization with value 0.0.
strategy to set the score for each indices in forward and backward per table. TIMESTAMP | STEP | CUSTOMIZED | LFU | NO_EVICTION
max number of embedding rows
percentage of embedding rows caching on gpu
init number of capacity
init table path
hash-table bucket capacity. default 128 (matches dynamicemb DEFAULT_BUCKET_CAPACITY). larger buckets trade probe cost for higher load factor.
Used in:
number of steps to evaluate.
the frequency progress be logged during eval
Used in: , ,
decay steps or epochs
decay rate
if true, decay the learning rate at discrete intervals
warmup start learning rate
warmup steps or epochs
minimum learning rate
schedule by epoch or by step.
Used in:
type of exporter [latest | best] when train_and_evaluation latest: regularly exports the serving graph and checkpoints best: export the best model according to best_exporter_metric
the metric used to determine the best checkpoint
metric value the bigger the best
mixed precision mode for inference/export [BF16 | FP16 | ""]. The export-time AMP intent is taken verbatim from this field; if it disagrees with train_config.mixed_precision a warning is logged. When set, the dense sub-graph is wrapped in torch.autocast before torch.export so that AOT Inductor captures dtype-promoting casts as a wrap_with_autocast HOP.
whether to use torch.backends.cudnn.allow_tf32
whether to use torch.backends.cuda.matmul.allow_tf32
Used in: ,
feature name, e.g. kv_os_click_count
expression, e.g. sigmoid(pv/(1+click))
variables in expression, e,g. ["item:pv", "item:click"]
embedding name, feature with same embedding name will share embedding
embedding dimension
boundaries for bucktize numeric expr value
fg multi-value separator
fill value when vector length mismatch, default is NaN.
embedding pooling type, available is {sum | mean}
fg default value
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
mask value in training progress
if value_dim = 0, it supports multi-value
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
autodis embedding
mlp embedding
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in:
number of experts per task
number of experts for share
mlp network of experts per task
mlp network of experts for share
Strictly-typed subset of faiss.Kmeans(D, K, **kwargs) knobs. Unset fields fall back to faiss's own defaults (so it is safe to leave partially set). ``gpu`` is intentionally omitted — the fit is CPU-only (SidRqkmeans refuses a visible CUDA device).
Used in:
Used in:
Used in:
Suffix appended to each feature's embedding_name so groups with different suffixes use independent embedding tables. Empty == disabled.
Used in:
Used in:
input data is feature generate encoded, we do not do fg
input data is raw feature, we use python to run feature generate
input data is raw feature, we use fg_handler to run feature generate
input data is after feature generate but before do bucketize, we do bucketize only
Used in:
only need specify it when use CsvDataset and value dtype can not be inferred (all values in the column are null)
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in: ,
task tower mlp
sub task configs
Used in:
task name for the task
label for the task
bitmask for get actual label for binary classification task
loss for the task
support multi-class classification loss
metrics for the task
training loss weight for the task
log train merics for task
Used in: , ,
Used in: ,
slice candidate dim to uih dim
padding candidate dim to uih dim
linear transform uih and candidate to same dim
Used in:
action encoder config
enable interleave target or not
action embedding mlp config
content encoder config
content embedding mlp config
Used in:
action encoder config
action embedding mlp config
content encoder config
content embedding mlp config
Used in: , ,
mlp for sequence embedding
mlp for sequence and contextual embedding
Used in:
input preprocessor with contextual features
input preprocessor with interleave targets
input preprocessor for sequence-only models (no candidate concat)
Used in:
(message has no fields)
Used in:
(message has no fields)
Used in:
mlp for uih seq embedding
mlp for candidate seq embedding
Used in:
l2 norm postprocessor
layer norm postprocessor
timestamp layer norm postprocessor
Used in:
(message has no fields)
Used in:
mlp hidden dimension
dropout ratio for contextual embedding
Used in:
buckets for position embedding
buckets for timestamp embedding
use timestamp encoding or not.
transform function for timestamp gap. sqrt | log
timestamp gap will div by time_bucket_increments
Used in:
action embedding dim
bitmask of each action
thresholds for watch time to actions
bitmask for watch time to actions
action embedding weights init std
Used in:
mlp hidden dimension
Used in:
(message has no fields)
Used in:
time duration period units, e.g. 60 * 60 for hour of day.
time duration units per period, e.g. 24 for hour of day.
Used in:
action encoder config (optional - for models with action info)
action embedding mlp config (required if action_encoder is set)
Used in:
Clipping type: "norm", "value", or "none"
Max gradient value/norm threshold
Norm type for gradient norm clipping (2.0 for L2, inf for max)
Enable global gradient clipping for distributed training
Used in:
Initial scale factor
Factor by which the scale is multiplied during update if no inf/NaN gradients occur for ``growth_interval`` consecutive iterations.
Factor by which the scale is multiplied during update if inf/NaN gradients occur in an iteration.
Number of consecutive iterations without inf/NaN gradients that must occur for the scale to be multiplied by ``growth_factor``.
Used in:
Used in:
Used in: , ,
stu config
dropout ratio after preprocessor
num stu layers
position encoder
input preprocessor
output postprocessor
Attention truncation: after this many full-sequence layers, drop UIH prefix tokens to keep only the last attn_truncation_tail_len per sample. Contextual prefix and targets survive. Must be in (0, attn_num_layers); 0 disables.
Trailing UIH cap; both fields must be > 0 to enable truncation.
MoT channel name. When non-empty, replaces the default `uih` prefix on UIH-side keys read from grouped_features (e.g. name="uih_click" -> uih_click.sequence, uih_click_action.sequence, uih_click_watchtime.sequence, uih_click_timestamp.sequence). Empty preserves the original uih.* keys (DlrmHSTU behavior). Channels with the same `embedding_name` on a feature share the underlying embedding table via EmbeddingGroup dedupe; per-channel tables multiply sparse-param + TBE + all-to-all cost by N.
Used in:
user and item tower output dimension; when 0 (default), no output Linear is applied -- the caller must size the user tower's STU output and the item tower's MLP output to match.
similarity method
similarity scaling factor
use in batch items as negative items.
Used in:
input feature group name (uih group)
HSTU config (STU, positional_encoder, input_preprocessor, output_postprocessor)
max sequence length
Weighted Random Sampling ItemID not in Batch and Sampling Hard Edge
Used in:
user data path schema => userid:int64 | weight:float
item data path schema => itemid:int64 | weight:float | attrs:string
hard negative edge path schema => userid:int64 | itemid:int64 | weight:float
number of negative sample
max number of hard negative sample
field names of attrs in train data or eval data
field name of item_id in train data or eval data
field name of user_id in train data or eval data
attribute delimiter of attrs string
number of negative samples for evaluator
only works on local
Weighted Random Sampling ItemID not with Edge and Sampling Hard Edge
Used in:
user data path schema => userid:int64 | weight:float
item data path schema => itemid:int64 | weight:float | attrs:string
positive edge path schema => userid:int64 | itemid:int64 | weight:float
hard negative edge path schema => userid:int64 | itemid:int64 | weight:float
number of negative sample
max number of hard negative sample
field names of attrs in train data or eval data
field name of item_id in train data or eval data
field name of user_id in train data or eval data
attribute delimiter of attrs string
number of negative samples for evaluator
field delimiter of input data
Used in: ,
feature name, e.g. item_id
feature input, e.g. item:item_id
embedding name, feature with same embedding name will share embedding
embedding dimension
number of hash size
number of id enumerators
id vocabulary list
id vocabulary dict
id value dimensions, default = 0, when use in seq, default = 1 if value_dim = 0, it supports id with multi-value
embedding pooling type, available is {sum | mean}
fg default value, default value before bucktize
fg multi-value separator
fg multi-value with whether has weight
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
mask value in training progress
zero collision hash
id vocabulary file path
vocab file relative directory
dynamic embedding
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
out-of-vocab(OOV) id bucketize value when use vocab_list or vocab_dict when use default_bucketize_value, we will not add additional bucketize_value of `default_value`=0, bucketize_value of <OOV>=1 into vocab_list or vocab_dict
value_type after fg before bucketize, you can specify it for better performance. e.g. fg_value_type = int64 when use num_buckets
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in:
task name for the task tower
label for the task, default is label_fields by order
metrics for the task
log train merics for task
loss for the task
num_class for multi-class classification loss
task specific mlp
training loss weights
intervention tower names
low_rank_dim
dropout_ratio
label name for indicating the sample space for the task tower
the loss weight for sample in the task space
the loss weight for sample out the task space
use pareto front minimal loss weight, ge 0 and lt 1
Used in:
Used in:
Used in: ,
feature name, e.g. kv_os_click_count
query, e.g. ["a:0.5", "b:0.5"]
document, e,g. ["d:0.5", "b:0.5"]
embedding name, feature with same embedding name will share embedding
embedding dimension
boundaries for bucktize numeric expr value
fg multi-value separator
fg kv separator, default is :.
fg normalizer, e.g. method=log10,threshold=1e-10,default=-10 method=zscore,mean=0.0,standard_deviation=10.0 method=minmax,min=2.1,max=2.2 method=expression,expr=sign(x)
embedding pooling type, available is {sum | mean}
fg default value
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
mask value in training progress
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
autodis embedding
mlp embedding
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in:
(message has no fields)
LFU: evict_score = access_cnt
Used in:
(message has no fields)
LRU: evict_score = 1 / pow((current_iter - last_access_iter), decay_exponent)
Used in:
decay rate is access step
Used in: ,
feature name, e.g. kv_os_click_count
map input, e.g. item:kv_os_click_count
key input, e.g. user:os
embedding name, feature with same embedding name will share embedding
embedding dimension
boundaries for bucktize numeric lookup value
number of hash size for sparse lookup value
number of id enumerators for sparse lookup value
id vocabulary list for sparse lookup value
id vocabulary dict
embedding pooling type, available is {sum | mean}
lookup value combiner type, available is {sum | mean | min | max | count}
fg default value
fg multi-value separator
lookup map value is sparse or numeric, when need_discrete is true, combiner will be empty string
lookup value need key as prefix or not.
fg normalizer, e.g. method=log10,threshold=1e-10,default=-10 method=zscore,mean=0.0,standard_deviation=10.0 method=minmax,min=2.1,max=2.2 method=expression,expr=sign(x)
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
lookup value dimensions
numeric lookup value separator
mask value in training progress
zero collision hash
id vocabulary file path
vocab file relative directory
dynamic embedding
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
out-of-vocab(OOV) id bucketize value when use vocab_list or vocab_dict when use default_bucketize_value, we will not add additional bucketize_value of `default_value`=0, bucketize_value of <OOV>=1 into vocab_list or vocab_dict
value_type after fg before bucketize, you can specify it for better performance. e.g. fg_value_type = int64 when use num_buckets
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
autodis embedding
mlp embedding
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in: , , , ,
Used in:
Used in:
user feature group name
user history group name
capsule config
concat mlp config for user interests vector
Used in:
Used in: , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
hidden units for each layer
ratio of dropout
activation function
use batch normalization
use bias
use layer normalization
Used in: , , , , , , ,
(message has no fields)
Used in:
mmoe expert module definition
mmoe gate module definition
number of mmoe experts
task tower
Used in: , ,
a list of global steps or epochs at which to switch learning
a list of learning rates corresponding to intervals
Whether to linearly interpolate learning rates for steps in [0, schedule_steps[0]].
schedule by epoch or by step.
Used in:
the ratio between aggregation dim and masked input dim
the dim of aggregation layer
the dim of hidden ffn layer
Used in:
Used in: ,
number of mask blocks
mask block
mlp layer on top of mask blocks
use parallel or serial mask blocks
Used in: ,
feature name, e.g. match_cate_brand_click_count
nested map input, e.g. user:match_cate_brand_click_count
first layer (primary) key input, e.g. item:cate or ALL
second layer (secondary) key input, e.g. item:brand or ALL
embedding name, feature with same embedding name will share embedding
embedding dimension
boundaries for bucktize numeric match value
number of hash size for sparse match value
number of id enumerators for sparse match value
id vocabulary list for sparse match value
id vocabulary dict
embedding pooling type, available is {sum | mean}
match value combiner type, available is {sum | mean | min | max | count} optional string combiner = 12 [default = "sum"]; fg default value
fg multi-value separator
match map value is sparse or numeric, when need_discrete is true, combiner will be empty string
match value need pkey value as prefix or not.
match value need skey valueas prefix or not.
fg normalizer, e.g. method=log10,threshold=1e-10,default=-10 method=zscore,mean=0.0,standard_deviation=10.0 method=minmax,min=2.1,max=2.2 method=expression,expr=sign(x)
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
match value dimensions
mask value in training progress
zero collision hash
id vocabulary file path
vocab file relative directory
dynamic embedding
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
out-of-vocab(OOV) id bucketize value when use vocab_list or vocab_dict when use default_bucketize_value, we will not add additional bucketize_value of `default_value`=0, bucketize_value of <OOV>=1 into vocab_list or vocab_dict
value_type after fg before bucketize, you can specify it for better performance. e.g. fg_value_type = int64 when use num_buckets
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
autodis embedding
mlp embedding
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in: ,
(message has no fields)
Used in: ,
(message has no fields)
Used in: , , , ,
Used in:
SID generation models (600 is reserved for SidRqvae, arriving in the follow-up PR)
whether use pareto loss weight
Used in:
Used in:
Used in:
seq encoder name
sequence feature name
time windows len
mlp config for target attention score
Used in:
time windows len
mlp config for target attention score
Used in:
macro: calculate score for each class and average them weighted: calculates score for each class and computes weighted average using their support
Weighted Random Sampling ItemID not in Batch
Used in:
sample data path schema => id:int64 | weight:float | attrs:string
number of negative sample
field names of attrs in train data or eval data
field name of item_id in train data or eval data
attribute delimiter of attrs string
number of negative samples for evaluator
field delimiter of input data
item id delimiter
Weighted Random Sampling ItemID not with Edge
Used in:
user data path schema => userid:int64 | weight:float
item data path schema => itemid:int64 | weight:float | attrs:string
positive edge path schema => userid:int64 | itemid:int64 | weight:float
number of negative sample
field names of attrs in train data or eval data
field name of item_id in train data or eval data
field name of user_id in train data or eval data
attribute delimiter of attrs string
number of negative samples for evaluator
field delimiter of input data
Used in:
small epsilon clamping the population mean label rate away from {0, 1}.
Used in: ,
feature name, e.g. overlap_ratio
query input name, e.g. user:query
title input name, e,g. item:title
overlap calculate method, available is {query_common_ratio | title_common_ratio | is_contain | is_equal}
embedding name, feature with same embedding name will share embedding
embedding dimension
boundaries for bucktize numeric expr value
fg normalizer, e.g. method=log10,threshold=1e-10,default=-10 method=zscore,mean=0.0,standard_deviation=10.0 method=minmax,min=2.1,max=2.2 method=expression,expr=sign(x)
embedding pooling type, available is {sum | mean}
fg default value optional string default_value = 11 [default = "0"]; fg multi-value separator
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
mask value in training progress
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
autodis embedding
mlp embedding
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in:
epnet hidden units
activation function for epnet
ppnet hidden units
activation function for ppnet
ratio of dropout
domain feature name, must is num bucket
domain number for each task
task tower
Used in:
extraction network
task tower
Used in: , , , , , , , , , , , ,
embedding sharding type constraints data_parallel | table_wise | column_wise | row_wise | table_row_wise | table_column_wise | grid_shard
embedding compute kernel constraints dense | fused | fused_uvm | fused_uvm_caching | key_value
Used in:
Used in:
seq encoder name
sequence feature name
pooling type, sum or mean
maximum sequence length
Used in: ,
Used in: ,
feature name, e.g. click_count
feature input, e.g. item:click_count
embedding name, feature with same embedding name will share embedding
embedding dimension
boundaries for bucktize numeric feature
raw feature of multiple dimensions
fg normalizer, e.g. method=log10,threshold=1e-10,default=-10 method=zscore,mean=0.0,standard_deviation=10.0 method=minmax,min=2.1,max=2.2 method=expression,expr=sign(x)
embedding pooling type, available is {sum | mean}
fg default value
fg multi-value separator
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
mask value in training progress
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
autodis embedding
mlp embedding
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
Used in: ,
Used in:
COSINE = 0; EUCLID = 1;
Used in: ,
Used in:
dimension of input embeddings
number of attention heads
dimension of hidden linear layers
dimension of attention mechanism
dropout probability for linear layers
maximum length of attention window
alpha for mha attention
use group normalization or layer normalization.
whether to recompute normed_x in backward
whether to recompute_uvqk in backward
whether to recompute y in backward
whether to sort by sequence length when forwarding
sequence length of contextual feature. Sentinel: < 0 (default) = use input_preprocessor.contextual_seq_len().
Semi-Local Attention: local causal window size (0 = disabled)
Semi-Local Attention: global prefix length (0 = disabled)
attention output scaling divisor (denominator of the SiLU(QK)/N term). Sentinel: < 0 (default) = use runtime max_seq_len.
Used in:
seq encoder name
sequence feature name
multihead_attn_dim must be divisible by num_heads
self attention num heads
dropout for attn_output_weights
maximum sequence length
Used in:
Used in:
Used in:
Suffix appended to each feature's embedding_name; same semantics as FeatureGroupConfig.embedding_name_suffix. Empty == disabled.
Used in:
sequence name
max sequence length, only take effect in fg
sequence delimiter
sequence primary key name for serving, default will be user:{sequence_name}
sub feature config
Used in:
Input embedding dimension (K-Means runs directly on raw embeddings, no encoder).
Per-layer cluster counts, e.g. [256, 256, 256]. List length is the number of residual quantization layers. Entries may differ per layer (non-uniform codebooks such as [256, 512, 1024] are supported — the FAISS backend fits a separate ``faiss.Kmeans`` per layer).
L2-normalize residuals before each layer.
Strictly-typed extra kwargs forwarded to faiss.Kmeans(D, K, **kwargs).
Target number of embeddings to reservoir-sample for the FAISS fit. Bounds host memory regardless of corpus size. 0 (the default) auto-derives it as max(K) * max_points_per_centroid (the largest per-layer codebook, for non-uniform codebooks) — exactly what FAISS subsamples to internally (default 256), so no training points are wasted.
Name of the item embedding feature inside the input Batch.
Used in: , , , , ,
Used in:
seq encoder name
sequence feature name
maximum sequence length
Used in:
Used in:
Used in:
Used in:
Used in:
schema => itemid:int64 | weight:float | attrs:string
scheme => src_id:int64 | dst_id:int64 | weight:float edge for train.
scheme => src_id:int64 | dst_id:int64 | weight:float edge for retrieval beam search.
field names of attrs in train data or eval data
field name of item_id in train data or eval data
the number of negative samples per layer
attribute delimiter of attrs string
number of negative samples for evaluator
field delimiter of input data
The training process only trains a randomly selected proportion of nodes in the middle layers of the tree
The type of probability for selecting and retaining each layer in the middle layers of the tree
Used in: , , ,
task name for the task tower
label for the task
metrics for the task
log train merics for task
loss for the task
num_class for multi-class classification loss
task specific mlp
training loss weights
sample weight for the task
label name for indicating the sample space for the task tower
the loss weight for sample in the task space
the loss weight for sample out the task space
use pareto front minimal loss weight, ge 0 and lt 1
Used in:
lower case to upper case
upper case to lower case
sbc case to dbc case
traditional chinese to simple chinese
filter speicial chars
chinese split to chars with blanks
remove space
Used in:
if text_length greater than max_length, will not do normalize
stop char file path, default will use built-in stop char
text normalize options, default is TEXT_LOWER2UPPER & TEXT_SBC2DBC & TEXT_CHT2CHS & TEXT_FILTER
Used in: ,
feature name, e.g. title_token
feature input, e.g. item:title
embedding name, feature with same embedding name will share embedding
embedding dimension
text normalizer
tokenizer vocabulary file path
vocab file relative directory
embedding pooling type, available is {sum | mean}
fg default value, default value before bucktize
tokenizer_type type, available is {bpe | sentencepiece}
embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
mask value in training progress
default value when fg_mode = FG_NONE, when use pai-fg, you do not need to set the param. when use own fg and data contain null value, you can set the param for fill null
embedding param trainable or not
only used as fg dag intermediate result or not
embedding data type
embedding param constraints
max sequence length, only take effect when use it as sequence
sequence delimiter, only take effect when use it as sequence
specify sequence type fields in inputs. default is item side inputs.
When true, treat the tokenized output of a single text as a sequence of tokens (each token is one sequence element, value_dim=1) so the feature can be fed into sequence_encoder modules (pooling_encoder, self_attention_encoder, etc.) via EmbeddingCollection instead of being pooled by EmbeddingBagCollection. This is distinct from the `sequence_tokenize_feature` oneof entry, which interprets the input as a `sequence_delim`-separated list of texts.
Used in: , , , , ,
input feature group name
mlp config (optional; when unset the tower applies no projection)
Used in:
embedding part optimizer
dense part optimizer
number of steps to train models
number of epochs to train models
step interval for saving checkpoint
checkpoint to restore parameters from
checkpoint to restore parameters mapping, each line is {param name in current model}\\t{param name in old ckpt}
the frequency the loss and lr will be logged during training
profiling or not
use tensorboard or not.
epoch interval for saving checkpoint
the summaries to be saved in tensorboard, activated only when use_tensorboard=true, possible values are: "loss", "learning_rate", "parameter", "global_gradient_norm", "gradient_norm", "gradient" default values are ["loss", "learning_rate"]
where to use torch.backends.cudnn.allow_tf32
where to use torch.backends.cuda.matmul.allow_tf32
global embedding param constraints
mixed precision dtype.
grad_scaler dynamically estimates the scale factor each iteration.
gradient accumulation steps
dense gradient clipping config
maximum number of recent checkpoints to keep; 0 keeps all.
save every N seconds of consumed event-time (e.g. kafka message timestamp), aligned to the Unix epoch (not training epochs). 0 disables.
absolute event-time targets (Unix-epoch seconds); save once when consumed data crosses each. empty disables.
fraction of workers (0,1] that must pass a boundary/target before a timestamp checkpoint fires; default 0.5 (1.0 = all). outlier-robust.
Configuring this field dumps changed sparse embedding rows during CUDA training. Multi-GPU training writes one parquet shard per rank; column-wise embedding sharding is not supported.
TBD: qcomm config
Used in: , , , ,
metric decay rate
train_config.log_step_count_steps can divide decay_steps evenly.
Used in:
Mixture of Transducers: one HSTU per channel; per-candidate outputs are concatenated on the embedding dim. >= 2 entries requires every entry to set a unique non-empty `name` plus the matching `<name>` / `<name>_action` / `<name>_watchtime` / `<name>_timestamp` feature_groups. Candidate-side and contextual groups are shared across all channels.
multi task tower config
max sequence length
item embedding mlp hidden dimension
enables loss averaging computation globally across all ranks (total rank) instead of locally (local rank).
timestamp of sequence is ascending or descending
concat all contextual features on channel dim as one token
Used in:
regularization coefficient lambda
variational_dropout dimension
Used in:
Used in:
wide embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"
Used in:
Used in:
LinearCompressBlock output feature num
FactorizationMachineBlock output feature num
number of compressed features in optimized FM.
feature num mlp
Used in: ,
Used in: , , , , ,
zero collision size
evict interval steps
evict policy
lambda function string used to filter incoming ids before update/eviction. experimental feature. [input: Tensor] the function takes as input a 1-d tensor of unique id counts. [output1: Tensor] the function returns a boolean_mask or index array of corresponding elements in the input tensor that pass the filter. [output2: float, Tensor] the function returns the threshold that will be used to filter ids before update/eviction. all values <= this value will be filtered out.
Used in:
wide embedding init function, e.g. "nn.init.uniform_,a=-0.01,b=0.01"