nemo_automodel.components.models.qwen2_5_omni.state_dict_adapter

View as Markdown

Module Contents

Classes

NameDescription
Qwen2_5OmniStateDictAdapterHF Qwen2.5-Omni checkpoint adapter (thinker-only path).

Data

_DROP_PREFIXES

_DROP_THINKER_KEY_SUBSTRINGS

_THINKER_PREFIX

logger

API

class nemo_automodel.components.models.qwen2_5_omni.state_dict_adapter.Qwen2_5OmniStateDictAdapter(
config: typing.Any,
backend: nemo_automodel.components.models.common.BackendConfig | None = None,
dtype: torch.dtype = torch.float32
)

Bases: StateDictAdapter

HF Qwen2.5-Omni checkpoint adapter (thinker-only path).

HF Qwen/Qwen2.5-Omni-* checkpoints store keys under three top-level prefixes: thinker.*, talker.*, token2wav.*. For ASR/text fine-tuning we only train the Thinker, so this adapter:

  • on from_hf: drops talker.* and token2wav.* keys and strips the thinker. prefix so keys align with our NeMo Thinker class.
  • on to_hf: re-adds the thinker. prefix so the saved checkpoint can be merged back with the original talker/token2wav shards.

Qwen2.5-Omni-3B is dense (no MoE), so no expert grouping logic is needed — this is a thin key-renaming adapter.

backend
= backend or BackendConfig()
nemo_automodel.components.models.qwen2_5_omni.state_dict_adapter.Qwen2_5OmniStateDictAdapter.convert_single_tensor_to_hf(
fqn: str,
tensor: typing.Any,
kwargs = {}
) -> list[tuple[str, typing.Any]]
nemo_automodel.components.models.qwen2_5_omni.state_dict_adapter.Qwen2_5OmniStateDictAdapter.from_hf(
hf_state_dict: dict[str, typing.Any],
device_mesh: typing.Optional[typing.Any] = None,
kwargs = {}
) -> dict[str, typing.Any]
nemo_automodel.components.models.qwen2_5_omni.state_dict_adapter.Qwen2_5OmniStateDictAdapter.to_hf(
state_dict: dict[str, typing.Any],
exclude_key_regex: typing.Optional[str] = None,
quantization: bool = False,
kwargs = {}
) -> dict[str, typing.Any]
nemo_automodel.components.models.qwen2_5_omni.state_dict_adapter._DROP_PREFIXES = ('talker.', 'token2wav.')
nemo_automodel.components.models.qwen2_5_omni.state_dict_adapter._DROP_THINKER_KEY_SUBSTRINGS = ('audio_tower.audio_bos_eos_token',)
nemo_automodel.components.models.qwen2_5_omni.state_dict_adapter._THINKER_PREFIX = 'thinker.'
nemo_automodel.components.models.qwen2_5_omni.state_dict_adapter.logger = logging.getLogger(__name__)