bridge.models.mimo.llava_provider#
LLaVA-style Vision-Language Model provider.
Module Contents#
Classes#
LLaVA-style Vision-Language Model provider. |
API#
- class bridge.models.mimo.llava_provider.LlavaMimoProvider#
Bases:
megatron.bridge.models.mimo.mimo_provider.MimoModelProviderLLaVA-style Vision-Language Model provider.
Preconfigures specs for:
Vicuna-7B style language model (Llama-based)
CLIP-style vision encoder
2-layer MLP projector
.. rubric:: Example
from my_encoders import HFCLIPEncoder provider = LlavaMimoProvider( … vision_encoder_module=HFCLIPEncoder, … mimo_parallelism_config=mimo_parallelism_config, … ) result = provider.provide()
- vision_encoder_module: Optional[Type]#
None
- vision_encoder_params: Dict#
‘field(…)’
- vision_projector_input_size: int#
1024
- image_special_token_id: int#
32000
- vocab_size: int#
32256
- language_config: Optional[megatron.bridge.models.transformer_config.TransformerConfig]#
None
- language_model_spec: Optional[megatron.core.transformer.spec_utils.ModuleSpec]#
None
- __post_init__()#
Build specs after initialization.
- _get_default_language_config() megatron.bridge.models.transformer_config.TransformerConfig#
Create default Vicuna-7B language model config.
- _build_vision_submodule_spec() megatron.core.transformer.spec_utils.ModuleSpec#
Build vision modality specification.