bridge.models.nemotronh.nemotron_h_bridge#
Module Contents#
Classes#
Pattern mapping that flattens Megatron’s two-level MTP indices: |
|
Resolve-time wrapper that flattens MTP indices then delegates to QKVMapping. |
|
Megatron Bridge for Nemotron-H Causal LM. |
Functions#
Replace ** then * sequentially with captures. |
Data#
API#
- bridge.models.nemotronh.nemotron_h_bridge.logger#
‘getLogger(…)’
- bridge.models.nemotronh.nemotron_h_bridge._replace_wildcards(
- pattern: str,
- captures: Tuple[str, ...],
Replace ** then * sequentially with captures.
- class bridge.models.nemotronh.nemotron_h_bridge._MTPFlatteningMapping(
- megatron_param: str,
- hf_param: str,
- *,
- mtp_layers_per_block: int,
- inner_override: Optional[int] = None,
Bases:
megatron.bridge.models.conversion.param_mapping.MegatronParamMapping[torch.Tensor]Pattern mapping that flattens Megatron’s two-level MTP indices:
megatron: mtp.layers.{outer}.mtp_model_layer.layers.{inner}.<…> hf: mtp.layers.{outer * L + inner}.<…>
Also supports an optional
inner_overridefor parameters that live outsidemtp_model_layer.layers.*in Megatron but should attach to a specific inner layer in HF (e.g., eh_proj on inner=0, final_layernorm on inner=L-1).Initialization
- resolve(
- captures: Tuple[str, ...],
- abstractmethod hf_to_megatron(
- hf_weights: torch.Tensor,
- megatron_module: torch.nn.Module,
- abstractmethod megatron_to_hf(
- megatron_weights: Optional[torch.Tensor],
- megatron_module: Optional[torch.nn.Module],
- class bridge.models.nemotronh.nemotron_h_bridge._MTPFlatteningQKVMapping(
- megatron_param: str,
- *,
- q: str,
- k: str,
- v: str,
- mtp_layers_per_block: int,
Bases:
megatron.bridge.models.conversion.param_mapping.MegatronParamMapping[typing.Dict[str,torch.Tensor]]Resolve-time wrapper that flattens MTP indices then delegates to QKVMapping.
Initialization
- resolve(
- captures: Tuple[str, ...],
- abstractmethod hf_to_megatron(
- hf_weights: Dict[str, torch.Tensor],
- megatron_module: torch.nn.Module,
- abstractmethod megatron_to_hf(
- megatron_weights: Optional[torch.Tensor],
- megatron_module: Optional[torch.nn.Module],
- class bridge.models.nemotronh.nemotron_h_bridge.NemotronHBridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Bridge for Nemotron-H Causal LM.
This bridge handles the conversion between HuggingFace NemotronHForCausalLM and Megatron-Core MambaModel formats, including weight mappings and configuration translation.
.. rubric:: Example
from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“nvidia/Nemotron-H-8B-Base-8K”, trust_remote_code=True) provider = bridge.to_megatron_provider()
Initialization
- CONFIG_MAPPING#
None
- ADDITIONAL_FILE_PATTERNS#
[‘*reasoning_parser.py’]
- build_conversion_tasks(
- hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
- megatron_model,
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
Convert HuggingFace Nemotron-H config to MambaModelProvider.
- classmethod get_hf_tokenizer_kwargs() dict#
Return HuggingFace tokenizer kwargs for Nemotron-H models.
Nemotron-H models only provide a fast tokenizer (tokenizer.json), so use_fast=True is required.
- classmethod megatron_to_hf_config(provider) dict#
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#