nemo_automodel.components.models.qwen3_5.state_dict_adapter#

State-dict adapter for Qwen3.5 dense (non-MoE) models.

Qwen3.5 dense uses HF’s GatedDeltaNet linear-attention layers. For FSDP compatibility (mixed-dtype: bf16 + fp32 A_log), patch_hf_model in cp_linear_attn moves A_log from mod._parameters into a _fp32_params submodule and patches __getattr__ to redirect mod.A_log reads. After patching, the model’s state_dict contains keys of the form ...linear_attn._fp32_params.A_log instead of the original ...linear_attn.A_log.

This adapter renames keys at save/load boundaries so that on-disk checkpoints match the original HF Qwen3.5 layout (bare A_log) and are directly loadable via transformers.AutoModelForImageTextToText.from_pretrained.

Module Contents#

Classes#

Qwen3_5DenseStateDictAdapter

Adapter that hides the _fp32_params wrapping in saved checkpoints.

Functions#

Data#

API#

nemo_automodel.components.models.qwen3_5.state_dict_adapter._FP32_PARAMS_TO_BARE#

‘compile(…)’

nemo_automodel.components.models.qwen3_5.state_dict_adapter._BARE_FP32_PARAM_NAMES#

(‘A_log’,)

nemo_automodel.components.models.qwen3_5.state_dict_adapter._strip_fp32_prefix(key: str) str#
nemo_automodel.components.models.qwen3_5.state_dict_adapter._route_to_fp32_holder(key: str) str#
class nemo_automodel.components.models.qwen3_5.state_dict_adapter.Qwen3_5DenseStateDictAdapter#

Bases: nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

Adapter that hides the _fp32_params wrapping in saved checkpoints.

to_hf(
state_dict: dict[str, Any],
**kwargs: Any,
) dict[str, Any]#
from_hf(
hf_state_dict: dict[str, Any],
device_mesh: Optional[Any] = None,
**kwargs: Any,
) dict[str, Any]#
convert_single_tensor_to_hf(
fqn: str,
tensor: Any,
**kwargs: Any,
) list[tuple[str, Any]]#