nemo_automodel.components.models.llava_onevision.state_dict_adapter#

State dict adapter for LLaVA-OneVision-1.5.

Module Contents#

Classes#

LlavaOneVisionStateDictAdapter

Converts between HF LLaVA-OneVision checkpoints and NeMo format.

API#

class nemo_automodel.components.models.llava_onevision.state_dict_adapter.LlavaOneVisionStateDictAdapter(config: Any, **kwargs)#

Bases: nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

Converts between HF LLaVA-OneVision checkpoints and NeMo format.

HF checkpoint key patterns: model.visual.{…} -> Rice ViT weights model.language_model.{…} -> Qwen3 LLM weights lm_head.{…} -> Language model head

NeMo model key patterns: model.vision_tower.{…} -> Rice ViT weights model.language_model.{…} -> Qwen3 LLM weights lm_head.{…} -> Language model head

The adapter primarily handles key renaming to bridge HF and NeMo formats.

Initialization

to_hf(
state_dict: dict[str, Any],
exclude_key_regex: Optional[str] = None,
quantization: bool = False,
**kwargs,
) dict[str, Any]#

Rename NeMo keys to HF keys. Tensors passed through as-is.

from_hf(
hf_state_dict: dict[str, Any],
**kwargs,
) dict[str, Any]#

Rename HF keys to NeMo keys.

This adapter only performs key renaming. Tensor shapes should match since we use the same architecture as the HF implementation.

_hf_to_nemo_key(hf_key: str) str#

Convert HF checkpoint key to NeMo key.

_nemo_to_hf_key(nemo_key: str) str#

Convert NeMo key to HF checkpoint key.