bridge.models.mimo.llava_provider#

LLaVA-style Vision-Language Model provider.

Module Contents#

Classes#

LlavaMimoProvider

LLaVA-style Vision-Language Model provider.

API#

class bridge.models.mimo.llava_provider.LlavaMimoProvider#

Bases: megatron.bridge.models.mimo.mimo_provider.MimoModelProvider

LLaVA-style Vision-Language Model provider.

Preconfigures specs for:

  • Vicuna-7B style language model (Llama-based)

  • CLIP-style vision encoder

  • 2-layer MLP projector

.. rubric:: Example

from my_encoders import HFCLIPEncoder provider = LlavaMimoProvider( … vision_encoder_module=HFCLIPEncoder, … mimo_parallelism_config=mimo_parallelism_config, … ) result = provider.provide()

vision_encoder_module: Optional[Type]#

None

vision_encoder_params: Dict#

‘field(…)’

vision_projector_input_size: int#

1024

image_special_token_id: int#

32000

vocab_size: int#

32256

language_config: Optional[megatron.bridge.models.transformer_config.TransformerConfig]#

None

language_model_spec: Optional[megatron.core.transformer.spec_utils.ModuleSpec]#

None

__post_init__()#

Build specs after initialization.

_get_default_language_config() megatron.bridge.models.transformer_config.TransformerConfig#

Create default Vicuna-7B language model config.

_build_vision_submodule_spec() megatron.core.transformer.spec_utils.ModuleSpec#

Build vision modality specification.