bridge.recipes.nemotronh.nemotron_nano_v2#

Module Contents#

Classes#

NemotronNanoV2CommonKwargs

Typed options accepted by Nemotron Nano v2 recipe helper functions.

NemotronNanoV2FinetuneKwargs

Typed options accepted by Nemotron Nano v2 finetuning recipe helper functions.

Functions#

nemotron_nano_9b_v2_pretrain_config

Return a pre-training config for Nemotron Nano 9B v2.

nemotron_nano_12b_v2_pretrain_config

Return a pre-training config for Nemotron Nano 12B v2.

nemotron_nano_9b_v2_finetune_config

Return a finetuning config for Nemotron Nano 9B v2.

nemotron_nano_12b_v2_finetune_config

Return a finetuning config for Nemotron Nano 12B v2.

Data#

API#

class bridge.recipes.nemotronh.nemotron_nano_v2.NemotronNanoV2CommonKwargs#

Bases: typing_extensions.TypedDict

Typed options accepted by Nemotron Nano v2 recipe helper functions.

Initialization

Initialize self. See help(type(self)) for accurate signature.

model_provider: megatron.bridge.models.nemotronh.NemotronNanoModelProvider9Bv2 | megatron.bridge.models.nemotronh.NemotronNanoModelProvider12Bv2#

None

tokenizer_model: str | None#

None

dir: str | None#

None

name: str#

None

data_paths: list[str] | None#

None

data_args_path: str | None#

None

train_data_path: list[str] | None#

None

valid_data_path: list[str] | None#

None

test_data_path: list[str] | None#

None

per_split_data_args_path: str | None#

None

mock: bool#

None

tensor_model_parallel_size: int#

None

pipeline_model_parallel_size: int#

None

pipeline_dtype: torch.dtype | None#

None

virtual_pipeline_model_parallel_size: int | None#

None

context_parallel_size: int#

None

sequence_parallel: bool#

None

train_iters: int#

None

global_batch_size: int#

None

micro_batch_size: int#

None

seq_length: int#

None

lr: float#

None

min_lr: float#

None

lr_warmup_iters: int#

None

lr_decay_iters: int | None#

None

use_null_tokenizer: bool#

None

precision_config: megatron.bridge.training.mixed_precision.MixedPrecisionConfig | str | None#

None

comm_overlap_config: megatron.bridge.training.comm_overlap.CommOverlapConfig | None#

None

enable_default_comm_overlap: bool#

None

class bridge.recipes.nemotronh.nemotron_nano_v2.NemotronNanoV2FinetuneKwargs#

Bases: bridge.recipes.nemotronh.nemotron_nano_v2.NemotronNanoV2CommonKwargs

Typed options accepted by Nemotron Nano v2 finetuning recipe helper functions.

Initialization

Initialize self. See help(type(self)) for accurate signature.

pretrained_checkpoint: str | None#

None

peft: str | megatron.bridge.peft.base.PEFT | None#

None

packed_sequence: bool#

None

finetune_lr: float#

None

wandb_project: str | None#

None

wandb_entity: str | None#

None

wandb_exp_name: str | None#

None

bridge.recipes.nemotronh.nemotron_nano_v2.nemotron_nano_9b_v2_pretrain_config(
**user_kwargs: typing_extensions.Unpack[bridge.recipes.nemotronh.nemotron_nano_v2.NemotronNanoV2CommonKwargs],
) megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Nemotron Nano 9B v2.

This recipe is designed for single-node training (1 node). Default parallelism: TP=2, PP=1, SP=True.

See _nemotronh_common for the full list of parameters.

bridge.recipes.nemotronh.nemotron_nano_v2.nemotron_nano_12b_v2_pretrain_config(
**user_kwargs: typing_extensions.Unpack[bridge.recipes.nemotronh.nemotron_nano_v2.NemotronNanoV2CommonKwargs],
) megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Nemotron Nano 12B v2.

This recipe is designed for single-node training (1 node). Default parallelism: TP=4, PP=1, SP=True.

Note: Uses FP8 precision by default. Communication overlap is disabled by default.

See _nemotronh_common for the full list of parameters.

bridge.recipes.nemotronh.nemotron_nano_v2.nemotron_nano_9b_v2_finetune_config(
**user_kwargs: typing_extensions.Unpack[bridge.recipes.nemotronh.nemotron_nano_v2.NemotronNanoV2FinetuneKwargs],
) megatron.bridge.training.config.ConfigContainer#

Return a finetuning config for Nemotron Nano 9B v2.

Default configuration: 8 nodes, 64 GPUs

  • LoRA/DoRA: TP=2, PP=1, LR=1e-4

  • Full SFT: TP=2, PP=1, LR=5e-6

bridge.recipes.nemotronh.nemotron_nano_v2.nemotron_nano_12b_v2_finetune_config(
**user_kwargs: typing_extensions.Unpack[bridge.recipes.nemotronh.nemotron_nano_v2.NemotronNanoV2FinetuneKwargs],
) megatron.bridge.training.config.ConfigContainer#

Return a finetuning config for Nemotron Nano 12B v2.

Default configuration: 8 nodes, 64 GPUs

  • LoRA/DoRA: TP=4, PP=1, LR=1e-4

  • Full SFT: TP=4, PP=1, LR=5e-6

Note: Uses FP8 precision by default. Communication overlap is disabled by default.

bridge.recipes.nemotronh.nemotron_nano_v2.__all__#

[‘nemotron_nano_9b_v2_pretrain_config’, ‘nemotron_nano_12b_v2_pretrain_config’, ‘nemotron_nano_9b_v2…