Adding UniSRec model implemented on lightweight class hierarchy with pytorch preprocessing #306
Open
TOPAPEC wants to merge 18 commits into
Open
Adding UniSRec model implemented on lightweight class hierarchy with pytorch preprocessing #306TOPAPEC wants to merge 18 commits into
TOPAPEC wants to merge 18 commits into
Conversation
Standalone sequential recommender package, mimics ModelBase interface without touching existing rectools code. FlatSASRec - plain ID-embedding SASRec encoder. UniSRec - pretrained text embeddings + PCA/BN adaptor, 3-phase training (ID emb -> adaptor only -> full finetune). Uses lightweight rank_topk instead of TorchRanker, reuses SASRecDataPreparator for the data pipeline. 30 tests, smoke scripts for both models. Fix: NaN*0=NaN in IEEE 754 breaks attention padding masking via multiplication, switched to masked_fill.
New config options: - ffn_type: conv1d / linear_gelu / linear_relu + ffn_expansion - optimizer: adam / adamw - scheduler: cosine_warmup (with warmup_ratio, min_lr_ratio) - loss: softmax / BCE / gBCE / sampled_softmax (with gbce_t) - patience: early stopping via EarlyStopping callback + val split - data_preparator: accept custom preparator instance 31 tests passing.
added 2 commits
April 24, 2026 22:17
2e923df to
d68834f
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new rectools.fast_transformers subpackage providing GPU-native preprocessing and standalone sequential transformer recommenders (FlatSASRec + UniSRec), plus ranking utilities, scripts, and comprehensive tests.
Changes:
- Introduces torch-native sequence building (
build_sequences), embedding alignment, and lightweight dataset/dataloader helpers. - Adds UniSRec (pretrained text embeddings + adaptor + SASRec encoder) with Lightning training wrapper and a standalone
UniSRecModelAPI (fit/checkpoint/ONNX export). - Adds
rank_topk()for batched scoring with CSR filtering + whitelist, along with benchmark scripts and extensive test coverage.
Reviewed changes
Copilot reviewed 17 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| rectools/fast_transformers/init.py | Exposes the new fast_transformers public API surface. |
| rectools/fast_transformers/gpu_data.py | Implements torch-native preprocessing utilities (sequence building, embedding alignment, dataloader helpers). |
| rectools/fast_transformers/net.py | Adds FlatSASRec network implementation. |
| rectools/fast_transformers/ranking.py | Adds rank_topk() batching + filtering + whitelist ranking utility. |
| rectools/fast_transformers/unisrec_lightning.py | Adds LightningModule wrapper (loss/optimizer/scheduler dispatch) for UniSRec training phases. |
| rectools/fast_transformers/unisrec_model.py | Adds standalone UniSRecModel (3-phase training, checkpointing, ONNX export, ID mapping). |
| rectools/fast_transformers/unisrec_net.py | Adds UniSRec network (adaptor + transformer encoder + helper methods). |
| tests/fast_transformers/init.py | Test package marker for fast_transformers. |
| tests/fast_transformers/test_gpu_data.py | Tests for sequence building, embedding alignment, dataset/dataloader, and hashing. |
| tests/fast_transformers/test_net.py | Tests for FlatSASRec forward paths and encoding helpers. |
| tests/fast_transformers/test_onnx_export.py | Tests ONNX export/roundtrip for UniSRec network and UniSRecModel export. |
| tests/fast_transformers/test_ranking.py | Tests top-k ranking, filtering, whitelist behavior, and edge cases. |
| tests/fast_transformers/test_unisrec_lightning.py | Tests UniSRecLightning configuration + loss/scheduler dispatch behavior. |
| tests/fast_transformers/test_unisrec_model.py | Tests UniSRecModel fit phases, losses/optimizers/schedulers, checkpointing, and mapping. |
| tests/fast_transformers/test_unisrec_net.py | Tests UniSRec network output shapes, adaptor variants, and freeze/unfreeze helpers. |
| scripts/compare_sasrec_unisrec.py | Benchmark script to compare RecTools SASRec vs UniSRec-ID and generate a report. |
| scripts/comparison_report.md | Adds a sample benchmark report output. |
| CHANGELOG.md | Documents the new module and features under Unreleased. |
| .gitignore | Ignores new dev artifacts, model weights, and data folders. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+43
to
+49
| unique_items = torch.unique(item_ids) | ||
| n_unique = len(unique_items) | ||
|
|
||
| if id_mapping == "dense": | ||
| _, item_inv = torch.unique(item_ids, return_inverse=True) | ||
| internal_items = item_inv + 1 | ||
| elif id_mapping == "hash": |
Comment on lines
+276
to
+307
| x, y, unique_items, unique_users = build_sequences( | ||
| user_ids, | ||
| item_ids, | ||
| timestamps, | ||
| max_len=self.session_max_len, | ||
| min_interactions=self.train_min_user_interactions, | ||
| id_mapping=self.id_mapping, | ||
| ) | ||
| self._unique_items = unique_items.cpu() | ||
| self._unique_users = unique_users.cpu() | ||
| n_items = len(unique_items) | ||
|
|
||
| aligned_emb = align_embeddings(self.pretrained_item_embeddings, unique_items, n_items, self.id_mapping) | ||
|
|
||
| net = UniSRec( | ||
| n_items=n_items, | ||
| pretrained_embeddings=aligned_emb, | ||
| n_factors=self.n_factors, | ||
| projection_hidden=self.projection_hidden, | ||
| n_blocks=self.n_blocks, | ||
| n_heads=self.n_heads, | ||
| session_max_len=self.session_max_len, | ||
| dropout=self.dropout, | ||
| adaptor_dropout=self.adaptor_dropout, | ||
| adaptor_type=self.adaptor_type, | ||
| use_adaptor_ffn=self.use_adaptor_ffn, | ||
| ffn_type=self.ffn_type, | ||
| ffn_expansion=self.ffn_expansion, | ||
| ) | ||
|
|
||
| train_dl = make_dataloader(x, y, batch_size=self.batch_size, shuffle=True) | ||
|
|
Comment on lines
+448
to
+450
| lookup = {int(v): i + 1 for i, v in enumerate(self._unique_items.tolist())} | ||
| return torch.tensor([lookup.get(int(x), 0) for x in external_ids.tolist()], dtype=torch.long) | ||
|
|
Comment on lines
+59
to
+61
| viewed_mask = torch.tensor(batch_csr.toarray(), dtype=torch.bool, device=device) | ||
| scores[viewed_mask] = -float("inf") | ||
|
|
Comment on lines
+37
to
+46
| def test_padding_invariance(self, net: FlatSASRec) -> None: | ||
| """Different left-padding should produce same last-position embedding.""" | ||
| net.eval() | ||
| # Same content should produce identical output | ||
| x_a = torch.tensor([[0, 0, 0, 5, 10]]) | ||
| x_b = torch.tensor([[0, 0, 0, 5, 10]]) | ||
| with torch.no_grad(): | ||
| e_a = net.encode_last(x_a) | ||
| e_b = net.encode_last(x_b) | ||
| torch.testing.assert_close(e_a, e_b) |
Comment on lines
+107
to
+115
| class TestPaddingInvariance: | ||
| def test_same_input_same_output(self, net: UniSRec) -> None: | ||
| net.eval() | ||
| x_a = torch.tensor([[0, 0, 0, 5, 10]]) | ||
| x_b = torch.tensor([[0, 0, 0, 5, 10]]) | ||
| with torch.no_grad(): | ||
| e_a = net.encode_last(x_a, use_id=False) | ||
| e_b = net.encode_last(x_b, use_id=False) | ||
| torch.testing.assert_close(e_a, e_b) |
Comment on lines
+306
to
+311
| train_dl = make_dataloader(x, y, batch_size=self.batch_size, shuffle=True) | ||
|
|
||
| val_dl = None | ||
| if self.patience is not None: | ||
| val_y_last = y[:, -1:] | ||
| val_dl = make_dataloader(x, val_y_last, batch_size=self.batch_size, shuffle=False) |
Comment on lines
+276
to
+283
| x, y, unique_items, unique_users = build_sequences( | ||
| user_ids, | ||
| item_ids, | ||
| timestamps, | ||
| max_len=self.session_max_len, | ||
| min_interactions=self.train_min_user_interactions, | ||
| id_mapping=self.id_mapping, | ||
| ) |
feldlime
reviewed
Apr 30, 2026
- Add hash-based ID mapping (splitmix64) as alternative to dense torch.unique mapping in build_sequences and align_embeddings. - Add UniSRecModel.export_to_onnx() for native ONNX export of encoder and item embeddings (project_all). - Add UniSRecModel.map_item_ids() for external→internal ID conversion at inference time (works for both dense and hash modes). - Remove FlatSASRecModel/FlatSASRecLightning (RecTools-coupled wrappers that duplicated UniSRecModel functionality). - Add tests: hash mapping (including string-derived IDs), ONNX export roundtrip, map_item_ids for both modes.
- Remove ranking.py (duplicates TorchRanker) - Remove hash ID mapping from build_sequences/align_embeddings - Simplify UniSRecModel to single joint training phase (adaptor + transformer) - Rename gpu_data.py -> sequence_data.py, GPUBatchDataset -> SequenceBatchDataset - Vectorize map_item_ids with torch.searchsorted - Fix device default (None -> auto-detect from input tensor) - Fix double torch.unique call - Add empty dataset validation in fit() - Add **kwargs to make_dataloader - Add dataloader_num_workers passthrough - Move benchmark script to benchmark/ folder - Add KION training demo with Qwen3-Embedding-0.6B results - Update tests for simplified API - Clean up CHANGELOG and .gitignore
- Remove item_emb, use_id, freeze/unfreeze, phase references from net/lightning - Remove GPUBatchDataset alias and make_dataloader wrapper - Reorganize into preprocessing/ and unisrec/ subpackages - Add GPU-friendly HR@K, NDCG@K, MRR@K metrics (tested against RecTools) - Update benchmark, demo, and all tests (102 passed + 28 metric tests)
d68834f to
45ed8ae
Compare
- Add negative sampling transform in fit() for BCE/gBCE/sampled_softmax losses - Add e2e tests for all non-softmax losses via UniSRecModel.fit() - Fix load_checkpoint() default device: auto-detect cuda/cpu instead of hardcoded "cuda" - Fix map_item_ids() device mismatch when input is on CUDA - Fix Python 3.9 compat: replace PEP 604 unions with Optional[] in tests - Fix CHANGELOG: remove nonexistent FlatSASRecModel and make_dataloader() - Update benchmark: auto-download ML-20M, fallback random embeddings, fix paths
…ings, n_negatives validation - Run black/isort/flake8 on all fast_transformers files — all pass now - Fix val dataloader missing negatives when patience + non-softmax loss - Extract _NegativeSampler class: device-aware, resamples positive collisions - Validate n_negatives is a positive integer for non-softmax losses - Make align_embeddings() device-aware (supports CUDA pretrained embeddings) - Remove unused imports (os in benchmark, pytest in test_sequence_data) - Add CUDA guard in benchmark main() - Add e2e tests: non-softmax losses with patience, n_negatives=0/-1/None
Keep only device-awareness (the actual review request). Preserving pretrained.dtype could cause precision issues with float16 inputs.
- Add `device` parameter to UniSRecModel.__init__ (default None = input device) - Move x/y to CPU before DataLoader to avoid CUDA+multiprocessing issues - Benchmark: pass device="cuda" explicitly to build_sequences and UniSRecModel
…pylint, bandit) - Add type annotations across benchmark, tests, and source files (mypy 30→0 errors) - Annotate frozen_emb buffer and Optional head in net.py - Add assert guards for Optional item_id_mapping usage - Type sasrec_kwargs and nested functions in benchmark - Fix tensor index type in test_metrics
- isort: fix import ordering in __init__.py files and test_metrics.py - black: auto-format all new files to project style - flake8: add per-file-ignores in setup.cfg for new modules (D102, N806, N812, D401); fix D403 capitalization in test docstring - mypy: fix arg-type for align_embeddings (add assert for Optional), fix slice index type in test_unisrec_model - pylint: rename unused vars (B -> _B, unique_items -> _unique_items, y -> _y), move math import to top-level in metrics.py, add pylint-disable for too-many-* / protected-access / not-callable / redefined-outer-name, use dict literals instead of dict() - codespell: already clean - bandit: already clean
Add comprehensive tutorial covering model training, evaluation, inference, checkpointing, ONNX export, and comparison of different configurations (loss functions, adaptor types, optimizers).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
rectools.fast_transformerspackage for sequential recommendation models (UniSRec, FlatSASRec) that work directly with PyTorch tensors, without relying on theDataset/pandas pipeline. The main motivation is speeding up and simplifying extendability of the NN recommenders. Also aiming to simplify production use of rectools models by reducing boilerplates and making dataflow inside the model more straightforward.Description
rectools.fast_transformerspackage with:metrics— GPU-computed ranking metrics - much faster at scale compared to pandas ones.net— FlatSASRec implementation;preprocessing— vectorizedbuild_sequences,align_embeddings, andSequenceBatchDataset- also much faster at scale.unisrec— model network, Lightning module, high-level model API, ONNX export helpers, and demo docs.tests/fast_transformerscovering:.gitignore,CHANGELOG.md);benchmark/....Typing/Lint note
# type: ignore[import-untyped]forrequestsinbenchmark/compare_sasrec_unisrec.pysomypypasses in the current lint environment without changing dependency policy.Testing
tests/fast_transformersand are part of the standard test suite structure.