nemo_automodel.components.distributed.pipelining.hf_utils#
Module Contents#
Functions#
Return the nested text/LLM module if present, else the model itself. |
|
Create a pipeline-compatible forward method for HuggingFace inner models. |
|
Create a pipeline-compatible forward method for causal LM wrappers. |
|
Pipeline-compatible forward for the Gemma4 text decoder backbone. |
|
Pipeline-compatible forward for Gemma4ForConditionalGeneration (VLM top-level). |
|
Pipeline-compatible forward for Mistral3ForConditionalGeneration (VLM top-level). |
|
Return True for Mistral3ForConditionalGeneration (Pixtral + Ministral3). |
|
Return True only for Gemma4 VLM variants. |
|
Return True when model opts out of pipeline-aware forward patching. |
|
Patch a HF model/module to produce pipeline-compatible forward. |
|
Initialize HuggingFace model buffers needed before pipeline execution. |
|
Best-effort check for whether |
|
Validate if a model is compatible with torch.distributed.pipelining. |
Data#
API#
- nemo_automodel.components.distributed.pipelining.hf_utils.logger#
‘getLogger(…)’
- nemo_automodel.components.distributed.pipelining.hf_utils.TEXT_MODULE_ATTRS#
(‘language_model’, ‘text_model’, ‘text_decoder’)
- nemo_automodel.components.distributed.pipelining.hf_utils.MULTIMODAL_SUFFIXES#
(‘vision_tower’, ‘visual’, ‘vision_model’, ‘image_encoder’, ‘vision_encoder’, ‘embed_vision’, ‘audio…
- nemo_automodel.components.distributed.pipelining.hf_utils.get_text_module(model: torch.nn.Module) torch.nn.Module#
Return the nested text/LLM module if present, else the model itself.
- nemo_automodel.components.distributed.pipelining.hf_utils.create_pipeline_forward_inner(
- model_class_name: str = 'AutoModel',
Create a pipeline-compatible forward method for HuggingFace inner models.
- nemo_automodel.components.distributed.pipelining.hf_utils.create_pipeline_forward_causal_lm() Callable#
Create a pipeline-compatible forward method for causal LM wrappers.
- nemo_automodel.components.distributed.pipelining.hf_utils.create_pipeline_forward_gemma4_text() Callable#
Pipeline-compatible forward for the Gemma4 text decoder backbone.
Works for both HF Gemma4TextModel (dense path) and Gemma4MoETextModelBackend (MoE path). Handles:
Optional embed_tokens (None on non-first PP stages; hidden states arrive in input_ids slot)
Both full_attention and sliding_attention causal masks (Gemma4 uses mixed layer types)
Per-layer-type position embeddings: Gemma4RotaryEmbedding.forward(x, pos_ids, layer_type)
- nemo_automodel.components.distributed.pipelining.hf_utils.create_pipeline_forward_gemma4_vlm() Callable#
Pipeline-compatible forward for Gemma4ForConditionalGeneration (VLM top-level).
Stage 0: embeds text tokens, merges image features from vision tower (if pixel_values provided or stored in _vlm_pixel_values_chunks), then calls the patched language model. Non-first stages: passes hidden states straight to the patched language model. Last stage: applies lm_head and final-logit softcapping.
- nemo_automodel.components.distributed.pipelining.hf_utils.create_pipeline_forward_mistral3_vlm() Callable#
Pipeline-compatible forward for Mistral3ForConditionalGeneration (VLM top-level).
Stage 0: embeds text tokens, runs vision_tower + multi_modal_projector for image tokens, merges image features into inputs_embeds via
get_placeholder_mask/masked_scatter, then calls the patched language model. Non-first stages: passes hidden states straight through the patched language model. Last stage: applies lm_head.Mirrors the generic CausalLM PP forward but adds the Mistral3 vision path so
pixel_values/image_sizesreachget_image_featureson stage 0. Without this, the generic CausalLM path never touches vision_tower and image tokens are embedded as garbage text tokens.
- nemo_automodel.components.distributed.pipelining.hf_utils._is_mistral3_vlm(model: torch.nn.Module) bool#
Return True for Mistral3ForConditionalGeneration (Pixtral + Ministral3).
- nemo_automodel.components.distributed.pipelining.hf_utils._is_gemma4_vlm(model: torch.nn.Module) bool#
Return True only for Gemma4 VLM variants.
model.model.language_modelalone is not enough to identify Gemma4 — Kimi VL, Mistral4, Qwen3 VL MoE, Llava OneVision and others share that structure. Gate the Gemma4-specific PP forward on the HFmodel_typeso unrelated VLMs fall through to the generic CausalLM path instead of receiving Gemma4’s sliding/full-attention and softcapping logic.
- nemo_automodel.components.distributed.pipelining.hf_utils.model_keeps_self_forward(model: torch.nn.Module) bool#
Return True when model opts out of pipeline-aware forward patching.
Used by the pipeline split call site to skip
patch_hf_model_for_ppentirely for models whose ownforwardis already PP-aware (typically because it pulls pixel_values out ofself._vlm_pixel_values_chunksset by the training loop). Currently set on Qwen3-VL-MoE, Qwen3.5-MoE, KimiVL, and Kimi-K2.5-VL.
- nemo_automodel.components.distributed.pipelining.hf_utils.patch_hf_model_for_pp(
- model,
- patch_inner_model: bool = True,
- patch_causal_lm_model: bool = True,
Patch a HF model/module to produce pipeline-compatible forward.
The caller is responsible for skipping this function when the model opts out via
model_keeps_self_forward(model). This function itself only branches on the patch flavor:Gemma4 VLM (
config.model_type == 'gemma4'with a nested text backbone atmodel.model.language_model): patch the text backbone and VLM outer with Gemma4-specific VLM-aware forwards.Mistral3 VLM: patch the text backbone with the generic inner forward and the outer with the Mistral3-specific VLM forward.
Other models with
model.model(e.g., LlamaForCausalLM and other LLMs): patch inner and outer with the generic CausalLM forwards.Else: patch the module itself with the generic inner forward.
- nemo_automodel.components.distributed.pipelining.hf_utils.init_hf_model_buffers(
- model: torch.nn.Module,
- device: torch.device,
Initialize HuggingFace model buffers needed before pipeline execution.
- nemo_automodel.components.distributed.pipelining.hf_utils._PP_VLM_MODEL_TYPES_WITH_DEDICATED_FORWARD: tuple[str, ...]#
(‘gemma4’, ‘mistral3’)
- nemo_automodel.components.distributed.pipelining.hf_utils._is_vlm(model: torch.nn.Module) bool#
Best-effort check for whether
modelis a vision-language model.Looks at the standard VLM markers used elsewhere in the codebase: a nested
text_config, avision_towerattribute on the outer model, or avisualattribute on the inner model (Qwen-VL convention).
- nemo_automodel.components.distributed.pipelining.hf_utils.validate_hf_model_for_pipeline_support(model: torch.nn.Module) None#
Validate if a model is compatible with torch.distributed.pipelining.