nemo_automodel.components.models.qwen3_5.state_dict_adapter#
State-dict adapter for Qwen3.5 dense (non-MoE) models.
Qwen3.5 dense uses HF’s GatedDeltaNet linear-attention layers. For FSDP
compatibility (mixed-dtype: bf16 + fp32 A_log), patch_hf_model in
cp_linear_attn moves A_log from mod._parameters into a
_fp32_params submodule and patches __getattr__ to redirect
mod.A_log reads. After patching, the model’s state_dict contains keys of
the form ...linear_attn._fp32_params.A_log instead of the original
...linear_attn.A_log.
This adapter renames keys at save/load boundaries so that on-disk checkpoints
match the original HF Qwen3.5 layout (bare A_log) and are directly
loadable via transformers.AutoModelForImageTextToText.from_pretrained.
Module Contents#
Classes#
Adapter that hides the |
Functions#
Data#
API#
- nemo_automodel.components.models.qwen3_5.state_dict_adapter._FP32_PARAMS_TO_BARE#
‘compile(…)’
- nemo_automodel.components.models.qwen3_5.state_dict_adapter._BARE_FP32_PARAM_NAMES#
(‘A_log’,)
- nemo_automodel.components.models.qwen3_5.state_dict_adapter._strip_fp32_prefix(key: str) str#
- nemo_automodel.components.models.qwen3_5.state_dict_adapter._route_to_fp32_holder(key: str) str#
- class nemo_automodel.components.models.qwen3_5.state_dict_adapter.Qwen3_5DenseStateDictAdapter#
Bases:
nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapterAdapter that hides the
_fp32_paramswrapping in saved checkpoints.- to_hf(
- state_dict: dict[str, Any],
- **kwargs: Any,
- from_hf(
- hf_state_dict: dict[str, Any],
- device_mesh: Optional[Any] = None,
- **kwargs: Any,
- convert_single_tensor_to_hf(
- fqn: str,
- tensor: Any,
- **kwargs: Any,