`nemo_automodel.components.models.qwen3_5.state_dict_adapter`#

State-dict adapter for Qwen3.5 dense (non-MoE) models.

Qwen3.5 dense uses HF’s GatedDeltaNet linear-attention layers. For FSDP compatibility (mixed-dtype: bf16 + fp32 A_log), patch_hf_model in cp_linear_attn moves A_log from mod._parameters into a _fp32_params submodule and patches __getattr__ to redirect mod.A_log reads. After patching, the model’s state_dict contains keys of the form ...linear_attn._fp32_params.A_log instead of the original ...linear_attn.A_log.

This adapter renames keys at save/load boundaries so that on-disk checkpoints match the original HF Qwen3.5 layout (bare A_log) and are directly loadable via transformers.AutoModelForImageTextToText.from_pretrained.

Module Contents#

Classes#

Qwen3_5DenseStateDictAdapter

Adapter that hides the _fp32_params wrapping in saved checkpoints.

Functions#

`_strip_fp32_prefix`
`_route_to_fp32_holder`

Data#

`_FP32_PARAMS_TO_BARE`
`_BARE_FP32_PARAM_NAMES`

API#

nemo_automodel.components.models.qwen3_5.state_dict_adapter._FP32_PARAMS_TO_BARE#: ‘compile(…)’

nemo_automodel.components.models.qwen3_5.state_dict_adapter._BARE_FP32_PARAM_NAMES#: (‘A_log’,)

nemo_automodel.components.models.qwen3_5.state_dict_adapter._strip_fp32_prefix(key: str) → str#

nemo_automodel.components.models.qwen3_5.state_dict_adapter._route_to_fp32_holder(key: str) → str#

class nemo_automodel.components.models.qwen3_5.state_dict_adapter.Qwen3_5DenseStateDictAdapter#

Bases: nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter