`nemo_automodel.components.models.step3p7.state_dict_adapter`#

Module Contents#

Classes#

Step3p7StateDictAdapter

Adapter for Step3.7 VLM checkpoints.

Functions#

_mtp_layer_range

API#

nemo_automodel.components.models.step3p7.state_dict_adapter._mtp_layer_range(config: Any) → tuple[int, int]#

class nemo_automodel.components.models.step3p7.state_dict_adapter.Step3p7StateDictAdapter( config: Any, moe_config: nemo_automodel.components.moe.config.MoEConfig, backend: nemo_automodel.components.models.common.BackendConfig, dtype: torch.dtype = torch.bfloat16, )#

Bases: nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

Adapter for Step3.7 VLM checkpoints.

The released checkpoint stores the Step3.5 language backbone at top-level keys such as model.layers.* and stores vision keys as vision_model.* / vit_large_projector.*. The native AutoModel VLM keeps the language backbone under model.language_model so PP can split it as a nested text module, and reuses the Step3p5 expert-weight adapter for EP sharding.

Initialization

static _is_text_key(key: str) → bool#

static _to_text_hf_key(key: str) → str#

static _to_native_text_key(key: str) → str#

static _map_non_text_from_hf(key: str) → str | None#

static _map_non_text_to_hf(key: str) → str#

_map_mtp_from_hf(key: str) → str | None#

_map_mtp_to_hf(key: str) → str | None#

from_hf(

hf_state_dict: dict[str, Any],

device_mesh: torch.distributed.device_mesh.DeviceMesh | None = None,

**kwargs: Any,

) → dict[str, Any]#

to_hf(

state_dict: dict[str, Any],

exclude_key_regex: str | None = None,

quantization: bool = False,

**kwargs: Any,

) → dict[str, Any]#

convert_single_tensor_to_hf(

fqn: str,

tensor: Any,

**kwargs: Any,

) → list[tuple[str, Any]]#

nemo_automodel.components.models.step3p7.state_dict_adapter#

Module Contents#

Classes#

Functions#

API#

`nemo_automodel.components.models.step3p7.state_dict_adapter`#