Skip to content

Prepping for v3.0.0b2#1185

Merged
jlarson4 merged 4 commits intodev-3.xfrom
dev-3.x-canary
Feb 26, 2026
Merged

Prepping for v3.0.0b2#1185
jlarson4 merged 4 commits intodev-3.xfrom
dev-3.x-canary

Conversation

@jlarson4
Copy link
Collaborator

Description

Prepping the model registry, adding new architectures

  • Compatibility tests with python 3.10, 3.11, 3.12 now run in sequence instead of parallel. This takes longer, but in v3.x the volume of HuggingFace requests gets rate limited when running all three of these at once in addition to the full coverage test
  • Updated the docs to have separate tables for HookedTransformer compatibility and TransformerBridge compatibility
    • TransformerBridge table can be filtered by model name, model status, organization, and architecture.
    • Filters are saved and filtered pages can be shared by URL
    • Model details are available by clicking the "details" link in each row
  • Added testing for Model Registry
  • Benchmark System
    • Improved floating-point precision, made sure dtype requests when running properly integrate with all functionalities
    • Removed cross model comparisons in favor of comparing the HuggingFace's model
    • HF Scraper discovered 4,908 models that should be supported by the TransformerLens system across all architectures
    • Registry data: supported_models.json (54K+ lines), architecture_gaps.json, verification_history.json
    • Added architecture_gaps.json identifying unsupported HF architectures
    • Added batch verification system verify_models for running supported models through the benchmark system
  • Verification System
    • Added handling for models of all sizes
    • Verified 480+ models load within the transformer lens system
    • Resolved issues with high-value models

Details

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Screenshots

Screenshot 2026-02-25 at 9 09 27 AM

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

jlarson4 and others added 4 commits February 19, 2026 15:19
* Testing R1 Distills to confirm functional in TransformerLens

* Updating order to be alphabetical

* Setup StableLM architecture adapter

* Resolved weight and qk issues with stablelm. Added more models

* Added more models

* reformatted

* Created a ArchitectureAdapter for OpenElm, handled trusting remote code

* Fix formatting

* Removed test file, update benchmark

* Add mock model test

* More benchmark adjustments

* removed improperly listed supported models

* Updating to resolve existing weight diff issues

* began working through issues with exsting architecture benchmarks

* Resolve any existing weight folding issues we can possibly resolve

* Fixing test failures

* Clean up format and other changes

* Added text quality benchmark, updated to pass CI

* Cleaned up comment, tightened tolerances further for bfloat16 models

* Removed unnecessary testing file

* Cleanup of redundant code

* Resolve type issues and format issues
* created initial model registry tool

* fixed some formatting issues

* ran format

* Updated to resolve merge issues, added verification system with model compatibility reporting

- Added batch verification system (verify_models.py) with status codes (0=unverified, 1=verified, 2=skipped, 3=failed)
- HF scraper with min_downloads=500 threshold (4,846 models across 20 architectures)
- 58 models verified, HF token sanitization in verification notes
- Registry data stored in single supported_models.json and verification_history.json files
- API, validation, and benchmark tooling updated for new registry format

* Model registry updates, including new report generation and alias drift detection features.

* Initial pass test of all architecture adapters

* Updating the last adapters to ensure successful runs

* Type and format fixes before PR

---------

Co-authored-by: jlarson4 <jonahalarson@comcast.net>
* created initial model registry tool

* fixed some formatting issues

* ran format

* Updated to resolve merge issues, added verification system with model compatibility reporting

- Added batch verification system (verify_models.py) with status codes (0=unverified, 1=verified, 2=skipped, 3=failed)
- HF scraper with min_downloads=500 threshold (4,846 models across 20 architectures)
- 58 models verified, HF token sanitization in verification notes
- Registry data stored in single supported_models.json and verification_history.json files
- API, validation, and benchmark tooling updated for new registry format

* Model registry updates, including new report generation and alias drift detection features.

* Initial pass test of all architecture adapters

* Updating the last adapters to ensure successful runs

* Type and format fixes before PR

* First verification test of the top 10 models for each language. Documenting verification changes via a new page in the docs site.

* Fixed initial issues with Gemma models, OLMo2, OpenELM, and Llama.

* Additional smaller fixes for other popular models

* Added single model run for verify models

* Resolving format issues

* Updated to include verification of the R1 distills

* fixed bug where cfg.d_mlp was sometimes dropped

* Additional batch of model verification and verification system improvements

* Gemma3 issue resolutions

* Updating docs to add a prefix filter

* Updating supported models to properly track text quality

* Additional Partial verification, added text quality to supported models, improve documentation

* Docstring test fix

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Jonah Larson <jlarson@Jonahs-MacBook-Pro.local>
* created initial model registry tool

* fixed some formatting issues

* ran format

* Updated to resolve merge issues, added verification system with model compatibility reporting

- Added batch verification system (verify_models.py) with status codes (0=unverified, 1=verified, 2=skipped, 3=failed)
- HF scraper with min_downloads=500 threshold (4,846 models across 20 architectures)
- 58 models verified, HF token sanitization in verification notes
- Registry data stored in single supported_models.json and verification_history.json files
- API, validation, and benchmark tooling updated for new registry format

* Model registry updates, including new report generation and alias drift detection features.

* Initial pass test of all architecture adapters

* Updating the last adapters to ensure successful runs

* Type and format fixes before PR

* First verification test of the top 10 models for each language. Documenting verification changes via a new page in the docs site.

* Fixed initial issues with Gemma models, OLMo2, OpenELM, and Llama.

* Additional smaller fixes for other popular models

* Added single model run for verify models

* Resolving format issues

* Updated to include verification of the R1 distills

* fixed bug where cfg.d_mlp was sometimes dropped

* Additional batch of model verification and verification system improvements

* Gemma3 issue resolutions

* Updating docs to add a prefix filter

* Updating supported models to properly track text quality

* Additional Partial verification, added text quality to supported models, improve documentation

* Docstring test fix

* Fixed a small bug to ensure MoE weights are properly formatted. Improve memory estimate for verify_models. Add verification for 36 more high value models

* Updating docs copy

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Jonah Larson <jlarson@Jonahs-MacBook-Pro.local>
@jlarson4 jlarson4 merged commit d5561da into dev-3.x Feb 26, 2026
44 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants