nemo_automodel.components.models.llava_onevision.model
nemo_automodel.components.models.llava_onevision.model
LLaVA-OneVision-1.5 model implementation.
Matches the layout of lmms-lab/LLaVA-OneVision-1.5-*-{Base,Instruct} so that HF safetensors load into this module tree via LlavaOneVisionStateDictAdapter with only regex-renames (no tensor transforms).
Module Contents
Classes
Functions
Data
API
Bases: HFCheckpointingMixin, Module
LLaVA-OneVision-1.5 for conditional generation (Rice ViT + Qwen3 text).
Bases: Module
Combined vision + language backbone. Returns last_hidden_state.
Bases: PretrainedConfig
Top-level config for LLaVA-OneVision-1.5.
model_type matches the on-hub value exactly so AutoConfig.from_pretrained
resolves to this class without trust_remote_code once registered.
Bases: PretrainedConfig
Configuration for the Rice ViT vision tower.
Coerce a text_config dict from HF (or user) into a Qwen3Config.
LLaVA-OV-1.5’s text backbone is Qwen3 (q/k norm, GQA, standard SiLU MLP).
On-hub model_type is LLaVAOneVision1_5_text; we drop it so Qwen3Config
doesn’t reject the kwargs.
Accept a raw HF remote-code text config and return a Qwen3Config.
The constructor path for NeMo custom models is cls(hf_config) where
hf_config may be the remote-code Llavaonevision1_5Config whose
text_config is a LLaVAOneVision1_5_TextConfig instance. Normalize
to Qwen3Config so the inner Qwen3Model gets fields it understands.