These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)
| Commit: | 52002e2 | |
|---|---|---|
| Author: | Eric | |
| Committer: | GitHub | |
[feat] add model delta tracker (#546)
The documentation is generated from this commit.
| Commit: | 3d4d5a8 | |
|---|---|---|
| Author: | ShuQi | |
| Committer: | GitHub | |
[feat] SID: add SidRqkmeans model (FAISS-trained residual K-Means) (#539) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| Commit: | 7886e4c | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] Kafka: event-time driven checkpointing from message timestamp (#541) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| Commit: | fbd47be | |
|---|---|---|
| Author: | 天邑 | |
Merge remote-tracking branch 'origin/master' into feat/ultra-hstu-fp8 # Conflicts: # tzrec/version.py
| Commit: | da99c02 | |
|---|---|---|
| Author: | 天邑 | |
[fix] ULTRA-HSTU FP8: narrow the arch gate to SM90 + SM120-mode2 The previous "SM90+" gate was too permissive: - SM100 (Blackwell datacenter) has no FP8 kernel in the wheel; the dispatcher routes there via _sm100.hstu_varlen_fwd_100 which doesn't even take quant_mode (cuda_hstu_attention.py:399-403). - SM120 (Blackwell RTX) only handles quant_mode==2 (per-block, fwd-only, cuda_hstu_attention.py:282); for any other mode the wheel silently falls into the sm80 bf16/fp16 branch (line 308's `or major_version == 12`) -- the user gets non-FP8 attention with no warning. Tighten _assert_fp8_capable to accept exactly (sm90, any mode) or (sm120, mode=2), and reject everything else loudly. Pass fp8_quant_mode into the helper so it can mode-check on sm120. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | 90d592a | |
|---|---|---|
| Author: | 天邑 | |
[feat] ULTRA-HSTU FP8: extend support from SM90 to SM90+ (Blackwell) Relax the FP8 capability gate from "exactly SM90 (Hopper)" to "SM90+" so the same fp8_quant_mode>=0 path also runs on sm100 (Blackwell) and sm120 (Blackwell RTX). The wheel dispatches to its per-arch FP8 kernel internally (sm120/Blackwell RTX is forward-only and supports only quant_mode=2; that constraint surfaces from the wheel's own check, not tzrec's). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | f8ac3b3 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] support keep_checkpoint_max with async checkpoint pruning (#528) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | b9dd23e | |
|---|---|---|
| Author: | 天邑 | |
[feat] ULTRA-HSTU: add STU.fp8_quant_mode proto field Add an int32 `fp8_quant_mode` (default -1) to the STU message. -1 keeps attention in bf16/fp16; 0..5 select an FP8 mode forwarded to the CUTLASS (SM90/Hopper) kernel. Mirrors the wheel's quant_mode int and the existing scaling_seqlen=-1 sentinel style. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | f67ec93 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[refactor] HSTUMatch with STUStack + UIHPreprocessor + block-suffix candidates (#506) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | 3a2a589 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[refactor] MatchTowerWoEG accepts feature_groups (plural); DSSMTower takes EmbeddingGroup (#510) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | 9ae727f | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] metrics: NormalizedEntropy for binary classification (#507) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | 98855ee | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[refactor] dataset/sampler: dynamic expand_factor + build_sampler_input + block-suffix combine (#505) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | 37d7fc4 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] FeatureGroupConfig.embedding_name_suffix to break embedding sharing across groups (#504) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | a446e12 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[bugfix] thread contextual_seq_len from preprocessor to STULayer (proto sentinel + truncation total_uih_len + AOTI-friendly SLA builder) (#501) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | c7de8a9 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] stu.scaling_seqlen + drop autotune assert strip (#500) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | fa39911 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] Adadelta + RMSprop sparse and dense optimizers (#499) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | 5993c45 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[perf] AOTI export knobs: fp32 unbacked floats, sample-input autotune, TF32 from export_config (#498) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | f2d0116 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] ULTRA-HSTU Mixture of Transducers (MoT) (#492) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | 03ec5e6 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] ULTRA-HSTU mid-stack attention truncation (#488) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | 401cb29 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] Semi-Local Attention + selective activation rematerialization (#486) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Commit: | 8327341 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] support TokenizeFeature as token-level sequence input (#470) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| Commit: | b203ece | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add CUTLASS kernel backend for HSTU attention (#465) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| Commit: | 6673e00 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] integrate dynamicemb table fusion (wheel 20260407.97b80bf) (#466) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| Commit: | 8cadac0 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add concat_contextual_features option to DlrmHSTU (#459) When enabled, all contextual features are concatenated on the channel dimension and projected as a single token instead of N separate tokens, reducing HSTU attention cost by shortening sequence length. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| Commit: | a0d6e8a | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add per-task loss weight to FusionSubTaskConfig (#453) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| Commit: | 8c2bdec | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add CosineAnnealingLR and CosineAnnealingWarmRestartsLR schedules (#454) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| Commit: | 5351e3f | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add CombineFeature support (#447) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| Commit: | 1b33f24 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add label_smoothing support to BinaryCrossEntropy loss (#455) Label smoothing helps with noisy click labels and improves generalization in ranking models. Smooths hard binary labels using the standard formula: label * (1 - eps) + 0.5 * eps, consistent with PyTorch CrossEntropyLoss. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| Commit: | f93d73d | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add grad clipping for dense params (#424)
| Commit: | 1e7922b | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] support input fields str (#412)
| Commit: | 777f1e8 | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] support pepnet (#402)
| Commit: | 999cbe5 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add kafka dataset (#401)
| Commit: | 4d1dcc5 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] dlrm hstu support sequence timestamp is descending order (#395)
| Commit: | 0693de6 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[bugfix] make sequence related config optional (#390)
| Commit: | 4ca5ed5 | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] dlrm and wukong support only one sparse group (#385)
| Commit: | f5776db | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] support sequence cross features (#375)
| Commit: | 0f2f0dd | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] support pe ltr in train wrapper (#381)
| Commit: | f138b22 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] support initial_accumulator_value for FusedSparseAdagradOptimizer & add additional optimizer configuration options (#382)
| Commit: | b926ac1 | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] add wukong model (#372)
| Commit: | 93c6b46 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add AdmissionStrategy support for DynamicEmbedding (#362)
| Commit: | c8e5561 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | 天邑 | |
[feat] add time_bucket_increments for DlrmHSTU PositionEncoder (#359)
| Commit: | 9f6d8d7 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add time_bucket_increments for DlrmHSTU PositionEncoder (#359)
| Commit: | a7271a9 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] mtl tower of dlrm hstu support num_class > 1 (#352)
| Commit: | 3d2a4a8 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] support dynamic batch size with sample cost (#343)
| Commit: | c4a4944 | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] mind dynamic routing support zero init (#342)
| Commit: | 7165db6 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add TMA support for hstu attn & rms_norm test (#336)
| Commit: | 906f0ce | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add global average loss option for DlrmHSTU (#334)
| Commit: | a16a85a | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] TensorRT export (#318)
| Commit: | c8fbc3f | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] refactor dlrm hstu preprocess modules (#314)
| Commit: | 3348869 | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] support log training metric (#310)
| Commit: | bdf615f | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] support adamw optimizer and part optimizer and label soomthing (#297)
| Commit: | 5af80ff | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] export best model (#294)
| Commit: | cf2e73b | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add contextual_feature_to_pooling to DlrmHSTU preprocessors (#296)
| Commit: | a1247d9 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] make dlrmhstu watchtime feature optional (#290)
| Commit: | 77daba0 | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] support bool mask feature (#285)
| Commit: | b312efd | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add dynamicemb doc (#283)
| Commit: | 25dd248 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add tool to initialize dynamic embeddings from tables (#282)
| Commit: | 30865d1 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[bugfix] revert dynamic embedding hash bucket size and remove unused evict_strategy (#281)
| Commit: | 1a61fd1 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add dynamic embedding support (#279)
| Commit: | a75b5b7 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add kv dot product feature (#276)
| Commit: | a580a28 | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] support xauc and grouped xauc (#252)
| Commit: | 2627da0 | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] add sequence self attention encoder (#251)
| Commit: | 3779c0e | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] add dcnv2 and xdeepfm net (#242)
| Commit: | ce67118 | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] dcn_v1 (#235)
| Commit: | 74ef405 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] refine hstu ops & add triton tests for dlrm hstu (#231)
| Commit: | 525ce95 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add hstu rank model (#227)
| Commit: | be56886 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] oss dlrm hstu modules (#224)
| Commit: | df65528 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add use_ln option for MLP module & fix parse encoded sequence feature error msg (#223)
| Commit: | ee97015 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add fp16 embedding dtype and fix weight decay mode (#221)
| Commit: | d270e1f | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add mixed_precision bf16/fp16 and gradient accumulation support (#220)
| Commit: | 70208df | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] support feature only used as fg dag intermediate result (stub_type=true) (#218)
| Commit: | efda7f5 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] expr feature support value_dim & bump up pyfg to 0.7.1 (#216)
| Commit: | 91a4847 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add wide and deep model and wide_init_fn (#212)
| Commit: | 4c3ae6c | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] upgrade pyfg to 0.6.9 and refine expr/overlap feature doc (#199)
| Commit: | 10af7d3 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] support freeze embedding parameters (#206)
| Commit: | 4d3ac46 | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] add binary focal loss (#208)
| Commit: | bc6bfbe | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add allow_tf32 flag and global embedding param constraint (#188)
| Commit: | 46947a6 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add masknet for dbmtl and refine masknet logic (#187)
| Commit: | 18ee4d6 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add max sequence length for sequence encoder (#184)
| Commit: | be8da4b | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] write tensorboard log for model parameters (#181)
| Commit: | 4d59215 | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] masknet (#179)
| Commit: | 49a7f73 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add fg value_type config and make num_buckets default value_dtype as string (#175)
| Commit: | 92ad14b | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] optimize mind model (#157) - optimize creation and scaling for the routing_logit tensor - optimize the iteration of dynamic routing, capturing gradient after iteration - adjust MindUserTower's MLP modules, the inner layers and output layers are extracted separately - add bias hyper-parameter for the MLP module. For sequence feature, bias is not used
| Commit: | 75f3c47 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add kernel config and BaseModule (#151)
| Commit: | 4f833b4 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add regression and multi-classification metric (#149)
| Commit: | ec50459 | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] support dlrm model (#148)
| Commit: | caa27b5 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add custom feature and custom sequence feature (#144)
| Commit: | bf05427 | |
|---|---|---|
| Author: | iWelkin-coder | |
| Committer: | GitHub | |
[feat] Optimize HSTU training and sampling process (#93)
| Commit: | e8989ec | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add odps_data_compression config (#146)
| Commit: | 3fa13c4 | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] add rocket launching model (#129)
| Commit: | 62a90da | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] add mind model (#119)
| Commit: | 457da32 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] eval and save checkpoint by epoch (#116)
| Commit: | ae00b33 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] support dataset shuffle (#114)
| Commit: | 3bee923 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] add vocab file for features (#97)
| Commit: | d382062 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] make default bucketize value configurable (#94)
| Commit: | 4bcbee6 | |
|---|---|---|
| Author: | iWelkin-coder | |
| Committer: | GitHub | |
[feat] add hstu (#55)
| Commit: | cdba485 | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] add dual augmented two-tower model (#83)
| Commit: | 1b48405 | |
|---|---|---|
| Author: | chengaofei | |
| Committer: | GitHub | |
[feat] add task space for mtl loss (#82)
| Commit: | 00a24e4 | |
|---|---|---|
| Author: | Hongsheng Jin | |
| Committer: | GitHub | |
[feat] refactor embedding group input tile and dense embedding collection (#75) * refactor embedding group input tile and dense embedding collection * fix tests * refactor proto and add tests * refactor proto and add tests * fix tests * fix tests * fix tests * add docs
| Commit: | dfd2051 | |
|---|---|---|
| Author: | Eric Ge | |
| Committer: | GitHub | |
[feat] support Autodis and MLP embedding for raw features (#73)