core.models.vision.multimodal_projector#

Module Contents#

Classes#

MultimodalProjector

MultimodalProjector will take the encoded input with input_size hidden state and project it into the hidden size of the language model for multimodal training. When projector is type affine linear_fc1 from submodules is used.

API#

class core.models.vision.multimodal_projector.MultimodalProjector(
config: megatron.core.transformer.transformer_config.TransformerConfig,
submodules: megatron.core.transformer.mlp.MLPSubmodules,
projector_type: str,
input_size: int,
tp_group: Optional[torch.distributed.ProcessGroup] = None,
)#

Bases: megatron.core.transformer.module.MegatronModule

MultimodalProjector will take the encoded input with input_size hidden state and project it into the hidden size of the language model for multimodal training. When projector is type affine linear_fc1 from submodules is used.

Parameters:
  • transformer_config (TransformerConfig) – Transformer config

  • submodules (MLPSubmodules) – Specifies MLP submodules for mlp type projector

  • projector_type (str) – Projector type

  • input_size (int) – Input size from feature encoder

  • tp_group (torch.distributed.ProcessGroup) – Tensor parallel group

Initialization

forward(hidden_states)#

Run multimodal projector.

Parameters:

hidden_states (torch.Tensor) – Input.

Returns:

The projected output.

Return type:

torch.Tensor