Nemotron 3 Nano#
This guide explains how to post-train the Nemotron 3 Nano model using NeMo RL.
Download and prepare the data#
# Download RL data blend
uvx --from huggingface-hub hf download nvidia/Nemotron-3-Nano-RL-Training-Blend --repo-type dataset --local-dir=data
# Fill in placeholders in dataset
chmod +x data/create_nanov3_jsonl.py
./data/create_nanov3_jsonl.py --input data/train.jsonl --output data/train-full.jsonl
# Use the last 1000 rows for validation
head -n -1000 data/train-full.jsonl > data/train-split.jsonl
tail -n 1000 data/train-full.jsonl > data/val-split.jsonl
Prepare the code#
Note that we currently require using the nano-v3 branch to train Nemotron 3 Nano.
# Checkout NeMo RL
git clone -b nano-v3 https://github.com/NVIDIA-NeMo/RL.git
cd RL
# Initialize the submodules
git submodule update --init --recursive
Create a launch script#
Create a file named launch.sh with the following contents. Be sure to fill in the DATA_DIR, MODEL_CHECKPOINT, WANDB_API_KEY, SLURM_ACCOUNT, SLURM_PARTITION, MOUNTS. Note that the default recipe (examples/nemo_gym/grpo_nanov3.yaml) uses 32 nodes.
CODE_DIR=$PWD
SLURM_JOB_NAME=nano-v3-rl-training
# Fill these in
DATA_DIR=...
MODEL_CHECKPOINT=...
WANDB_API_KEY=...
SLURM_ACCOUNT=...
SLURM_PARTITION=...
MOUNTS=... # SRC:DST[,SRC:DST...] e.g., MOUNTS="/lustre:/lustre,/data:/data"
CONTAINER="nvcr.io/nvidia/nemo-rl:v0.4.0.nemotron_3_nano"
COMMAND="uv run examples/nemo_gym/run_grpo_nemo_gym.py --config examples/nemo_gym/grpo_nanov3.yaml data.train_jsonl_fpath=$DATA_DIR/train-split.jsonl data.validation_jsonl_fpath=$DATA_DIR/val-split.jsonl policy.model_name=$MODEL_CHECKPOINT logger.wandb_enabled=True"
COMMAND="${COMMAND}" \
CONTAINER="${CONTAINER}" \
MOUNTS="${MOUNTS}" \
WANDB_API_KEY=${WANDB_API_KEY} \
sbatch \
--nodes=32 \
--account="${SLURM_ACCOUNT}" \
--job-name="${SLURM_JOB_NAME}" \
--partition="${SLURM_PARTITION}" \
--time=4:0:0 \
--gres=gpu:8 \
ray.sub
Launch training#
bash launch.sh