Support transformers v5 by jlamypoirier · Pull Request #481 · ServiceNow/Fast-LLM

jlamypoirier · 2026-04-09T19:41:35Z

✨ Description

- Widen transformers version constraint to >=4.57.3,<6.0.0 - Version-gate PretrainedConfig init (__init__ vs __post_init__) and dtype attribute (torch_dtype vs dtype) using dataclasses.is_dataclass detection - Fall back to transformers.modeling_utils.no_init_weights for 4.x - Support both rope_parameters (5.x) and rope_theta/rope_scaling (4.x) in Llama import/export config - Handle both attribute paths for vision_tower in multimodal HF model test - Fix mtp_llama LlamaRotaryEmbedding to handle both rope config formats - Add _gdn_fla_available and _kda_fla_available flags to apriel2; use them to properly skip backup SSM tests when fla kernels are absent - Update CLAUDE.md with redirect-to-file and external model test guidance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…compatibility - apriel2/modeling_apriel2.py: add _TRANSFORMERS_V5 flag; fix _tied_weights_keys to dict format for 5.x (list for 4.x); add rope_parameters to PixtralRotaryEmbedding SimpleNamespace config - mtp_llama/modeling_mtp_llama.py: add _TRANSFORMERS_V5 flag; fix _tied_weights_keys - apriel2/conversion/llava/config.py: handle 5.x rope_parameters dict in text and vision configs alongside 4.x rope_theta - apriel2/conversion/llava/plan.py: version-conditional source weight key prefixes (5.x LlavaForConditionalGeneration adds model. prefix to submodules) - test_cache_contracts.py: update DynamicLayer.get_mask_sizes calls to pass int in 5.x (query_length) vs tensor in 4.x; update sdpa_mask signature for 5.x (q_length/q_offset) - test_convert_from_llava.py: use version-conditional embed_tokens source key - test_equivalence.py: fix get_image_features handling — 5.x returns BaseModelOutput with projected features in pooler_output (not last_hidden_state) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix num_blocks off-by-one in import_config (was subtracting 1) - Fix num_hidden_layers off-by-one in export_config (was adding 1) - Fix mtp_heads index off-by-one in get_converters (was prediction_distance - 1) - Fix hidden state collection order in MTPLlamaModel: add embedding before trunk loop and add trunk layer outputs inside the loop, consistent with standard transformers @capture_outputs behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update TOKENIZER_NAME from "bigcode/santacoder" to "gpt2" and update all hardcoded token values in data tests to match the gpt2 vocabulary. Also fix deprecated huggingface_hub.HfFolder.get_token() → get_token(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ERS_V4 - Deduplicate rope-type dispatch in LlamaAttentionConverter.import_config by normalizing rope_params/rope_theta from either checkpoint format first - Rename _TRANSFORMERS_V5 → _TRANSFORMERS_V4 (inverted flag) so v4 compat code is in `if _TRANSFORMERS_V4:` blocks — grep-and-delete to drop v4 - Flip all if/else so v5 code is the default path and v4 is the guarded branch - Import _TRANSFORMERS_V4 from config.py in huggingface.py; replace try/except with explicit if/else - Add comments for v5 changes that can't use the flag (TYPE_CHECKING guard, checkpoint format detection, model.model structure) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Use tuple prefixes unpacked into W(...) instead of the / operator, keeping the _TRANSFORMERS_V4 branching for the path prefix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Keep llava_layer/apriel_layer intermediate variables (with / operator) in loops; only the layer root W() calls use *prefix unpacking. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- apriel2: override tie_weights() in Apriel2PreTrainedModel to recompute MistralRotaryEmbedding.inv_freq after v5 meta-device loading zeroes it (non-persistent buffers not in checkpoint are materialized as zeros) - apriel2: hardcode _attn_implementation="eager" in preprocess mask config so an explicit float mask is always built (v5 sdpa returns None otherwise) - apriel2: use version-conditional kwarg name for create_causal_mask (inputs_embeds in v5, input_embeds in v4) - apriel2 test: compare only non-padding positions in test_logits_match - mtp_llama: update LlamaRotaryEmbedding and config for v5 compatibility - gpt/huggingface: remove dead code line Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

This reverts commit 39196c6.

- Replace removed Qwen3NextDynamicCache with DynamicCache; remove dropped cache_position kwarg from GDN forward; fix recurrent_states access path - Fix rope_theta KeyError (moved to rope_parameters dict in v5) - Fix attention_mask device mismatch in integration test - Expand+contiguous attn mask before SDPA to satisfy CUDA kernel contiguity - Use model._attn_implementation for causal mask creation so pure-causal inputs get None mask and SDPA uses is_causal=True (matching Qwen2 numerics) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jlamypoirier force-pushed the jlp_transformers_v5 branch from d412c8d to 6b39a0e Compare April 22, 2026 22:53

jlamypoirier and others added 12 commits April 22, 2026 20:34

Replace W-object path chaining with explicit W() calls in plan.py

93b3485

Use tuple prefixes unpacked into W(...) instead of the / operator, keeping the _TRANSFORMERS_V4 branching for the path prefix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Restore loop structure in plan.py; use prefix tuples only at layer init

912b71c

Keep llava_layer/apriel_layer intermediate variables (with / operator) in loops; only the layer root W() calls use *prefix unpacking. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix

563ed77

Fix mtp llama test

c51812c

misc

a091518

Revert "misc"

94ecb21

This reverts commit 39196c6.

jlamypoirier force-pushed the jlp_transformers_v5 branch from 6b39a0e to 94ecb21 Compare April 23, 2026 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support transformers v5#481

Support transformers v5#481
jlamypoirier wants to merge 13 commits intomainfrom
jlp_transformers_v5

jlamypoirier commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jlamypoirier commented Apr 9, 2026

✨ Description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant