core.models.vision.multimodal_projector#
Module Contents#
Classes#
MultimodalProjector will take the encoded input with input_size hidden state and project it into the hidden size of the language model for multimodal training. When projector is type affine linear_fc1 from submodules is used. |
API#
- class core.models.vision.multimodal_projector.MultimodalProjector(
- config: megatron.core.transformer.transformer_config.TransformerConfig,
- submodules: megatron.core.transformer.mlp.MLPSubmodules,
- projector_type: str,
- input_size: int,
- tp_group: Optional[torch.distributed.ProcessGroup] = None,
Bases:
megatron.core.transformer.module.MegatronModuleMultimodalProjector will take the encoded input with input_size hidden state and project it into the hidden size of the language model for multimodal training. When projector is type affine linear_fc1 from submodules is used.
- Parameters:
transformer_config (TransformerConfig) – Transformer config
submodules (MLPSubmodules) – Specifies MLP submodules for mlp type projector
projector_type (str) – Projector type
input_size (int) – Input size from feature encoder
tp_group (torch.distributed.ProcessGroup) – Tensor parallel group
Initialization
- forward(hidden_states)#
Run multimodal projector.
- Parameters:
hidden_states (torch.Tensor) – Input.
- Returns:
The projected output.
- Return type:
torch.Tensor