bridge.models.nemotron_vl.modeling_nemotron_vl#

Module Contents#

Classes#

NemotronVLModel

A stub Megatron implementation of a Nemotron Vision-Language model.

API#

class bridge.models.nemotron_vl.modeling_nemotron_vl.NemotronVLModel(
config: Optional[megatron.bridge.models.nemotron_vl.nemotron_vl_provider.NemotronNano12Bv2VLModelProvider] = None,
*,
llava_model: Optional[megatron.core.models.multimodal.llava_model.LLaVAModel] = None,
pre_process: bool | None = True,
post_process: bool | None = True,
vp_stage: Optional[int] = None,
)#

Bases: megatron.core.transformer.module.MegatronModule

A stub Megatron implementation of a Nemotron Vision-Language model.

At the moment the class only supports language-only forward passes. Vision inputs will raise NotImplementedError until a reference vision encoder is open-sourced.

Initialization

Create a wrapper that exposes an existing :class:LLaVAModel via the Bridge API.

Parameters: llava_model: A fully-assembled instance of :class:~megatron.core.models.multimodal.llava_model.LLaVAModel. config: (Optional) The provider used to generate the model. If omitted we fall back to llava_model.config.

set_input_tensor(input_tensor)#
forward(*args, **kwargs)#

Delegate the forward pass to the wrapped :class:LLaVAModel.

freeze(
*,
freeze_language_model: bool = False,
freeze_vision_model: bool = False,
freeze_vision_projection: bool = False,
) None#

Freeze selected sub-modules by turning off requires_grad.