bridge.models.nemotronh.nemotron_h_bridge#

Module Contents#

Classes#

_MTPFlatteningMapping

Pattern mapping that flattens Megatron’s two-level MTP indices:

_MTPFlatteningQKVMapping

Resolve-time wrapper that flattens MTP indices then delegates to QKVMapping.

NemotronHBridge

Megatron Bridge for Nemotron-H Causal LM.

Functions#

_replace_wildcards

Replace ** then * sequentially with captures.

Data#

API#

bridge.models.nemotronh.nemotron_h_bridge.logger#

‘getLogger(…)’

bridge.models.nemotronh.nemotron_h_bridge._replace_wildcards(
pattern: str,
captures: Tuple[str, ...],
) str#

Replace ** then * sequentially with captures.

class bridge.models.nemotronh.nemotron_h_bridge._MTPFlatteningMapping(
megatron_param: str,
hf_param: str,
*,
mtp_layers_per_block: int,
inner_override: Optional[int] = None,
)#

Bases: megatron.bridge.models.conversion.param_mapping.MegatronParamMapping[torch.Tensor]

Pattern mapping that flattens Megatron’s two-level MTP indices:

megatron: mtp.layers.{outer}.mtp_model_layer.layers.{inner}.<…> hf: mtp.layers.{outer * L + inner}.<…>

Also supports an optional inner_override for parameters that live outside mtp_model_layer.layers.* in Megatron but should attach to a specific inner layer in HF (e.g., eh_proj on inner=0, final_layernorm on inner=L-1).

Initialization

resolve(
captures: Tuple[str, ...],
) megatron.bridge.models.conversion.param_mapping.MegatronParamMapping#
abstractmethod hf_to_megatron(
hf_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) torch.Tensor#
abstractmethod megatron_to_hf(
megatron_weights: Optional[torch.Tensor],
megatron_module: Optional[torch.nn.Module],
) Dict[str, torch.Tensor]#
class bridge.models.nemotronh.nemotron_h_bridge._MTPFlatteningQKVMapping(
megatron_param: str,
*,
q: str,
k: str,
v: str,
mtp_layers_per_block: int,
)#

Bases: megatron.bridge.models.conversion.param_mapping.MegatronParamMapping[typing.Dict[str, torch.Tensor]]

Resolve-time wrapper that flattens MTP indices then delegates to QKVMapping.

Initialization

resolve(
captures: Tuple[str, ...],
) megatron.bridge.models.conversion.param_mapping.MegatronParamMapping#
abstractmethod hf_to_megatron(
hf_weights: Dict[str, torch.Tensor],
megatron_module: torch.nn.Module,
) torch.Tensor#
abstractmethod megatron_to_hf(
megatron_weights: Optional[torch.Tensor],
megatron_module: Optional[torch.nn.Module],
) Dict[str, torch.Tensor]#
class bridge.models.nemotronh.nemotron_h_bridge.NemotronHBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Nemotron-H Causal LM.

This bridge handles the conversion between HuggingFace NemotronHForCausalLM and Megatron-Core MambaModel formats, including weight mappings and configuration translation.

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“nvidia/Nemotron-H-8B-Base-8K”, trust_remote_code=True) provider = bridge.to_megatron_provider()

Initialization

CONFIG_MAPPING#

None

ADDITIONAL_FILE_PATTERNS#

[‘*reasoning_parser.py’]

build_conversion_tasks(
hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
megatron_model,
)#
provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
) megatron.bridge.models.mamba.mamba_provider.MambaModelProvider#

Convert HuggingFace Nemotron-H config to MambaModelProvider.

classmethod get_hf_tokenizer_kwargs() dict#

Return HuggingFace tokenizer kwargs for Nemotron-H models.

Nemotron-H models only provide a fast tokenizer (tokenizer.json), so use_fast=True is required.

classmethod megatron_to_hf_config(provider) dict#
mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#