Single Node Training

With your environment set up and data prepared, you’re ready to run training. But before committing to a multi-hour, multi-node job, it’s important to verify everything works correctly on a single node first.

Goal: Run a single-node GRPO training session to validate your environment.

Time: ~45 minutes

In this section, you will:

Download the Nemotron Nano 9B v2 model
Configure the model’s chat template
Clean up existing processes
Run a test training session with 3 steps

← Previous: Setup

Prerequisites

Make sure you have:

✅ Completed the Setup instructions
✅ Access to a running container session with GPUs
✅ (Optional) Weights & Biases API key for experiment tracking

0. Return to NeMo RL directory

Since we are running RL training, the following steps will all be run in the NeMo RL root directory, rather than NeMo Gym directory.

✅ Success Check: You should see a yaml file at examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml and a Python file at examples/nemo_gym/run_grpo_nemo_gym.py.

1. Download the Model

Estimated time: ~5-10 minutes

Download NVIDIA Nemotron Nano 9B v2:

$ HF_HOME=$PWD/.cache/ \
> HF_TOKEN={your HF token} \
>     hf download nvidia/NVIDIA-Nemotron-Nano-9B-v2

✅ Success Check: Model files are downloaded to .cache/hub/models--nvidia--NVIDIA-Nemotron-Nano-9B-v2/.

2. Configure the Chat Template

Estimated time: ~1 minute

The Nemotron Nano 9B v2 model uses a custom chat template that must be modified for RL training. This step modifies the cached version of the chat template:

$ tokenizer_config_path=$(find $PWD/.cache/hub/models--nvidia--NVIDIA-Nemotron-Nano-9B-v2 -name tokenizer_config.json)
$ sed -i 's/enable_thinking=true/enable_thinking=false/g' $tokenizer_config_path
$ sed -i 's/{%- if messages\[-1\]\['\''role'\''\] == '\''assistant'\'' -%}{%- set ns.last_turn_assistant_content = messages\[-1\]\['\''content'\''\].strip() -%}{%- set messages = messages\[:-1\] -%}{%- endif -%}//g' $tokenizer_config_path

✅ Success Check: The sed commands complete without errors.

3. Run Training

Estimated time: ~15-30 minutes

By default, this runs only 3 training steps (grpo.max_num_steps=3) as a small test run in preparation for multi-node training. If you are using a single node for the full training run, you can remove this value. The full training will take several hours.

$ # Set experiment name with timestamp
$ EXP_NAME="$(date +%Y%m%d)/nemo_gym_grpo/nemotron_nano_v2_9b/workplace_assistant_001"
$ mkdir -p results/$EXP_NAME
$ 
$ # Configuration file path
$ CONFIG_PATH=examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml
$ 
$ # Launch training
$ # Set these environment variables before running:
$ #   WANDB_API_KEY: Your Weights & Biases API key for logging
$ #   logger.wandb.project: Fill in your username
$ TORCH_CUDA_ARCH_LIST="9.0 10.0" \
> HF_HOME=$PWD/.cache/ \
> WANDB_API_KEY={your W&B API key} \
> uv run python examples/nemo_gym/run_grpo_nemo_gym.py \
>     --config=$CONFIG_PATH \
>     ++logger.wandb.project="${Your Username}-nemo-gym-rl-integration" \
>     ++logger.wandb.name=$EXP_NAME \
>     ++logger.log_dir=results/$EXP_NAME \
>     ++policy.generation.vllm_cfg.tool_parser_plugin=$(find $PWD/.cache -name nemotron_toolcall_parser_no_streaming.py) \
>     ++grpo.max_num_steps=3 \
>     ++checkpointing.checkpoint_dir=results/$EXP_NAME &> results/$EXP_NAME/output.log &
$ 
$ # Watch the logs
$ tail -f results/$EXP_NAME/output.log

Single GPU Training: If you only have 1 GPU available, use these modifications:

$ CUDA_VISIBLE_DEVICES=0 \
> TORCH_CUDA_ARCH_LIST="9.0 10.0" \
> HF_HOME=$PWD/.cache/ \
> WANDB_API_KEY={your W&B API key} \
> uv run python examples/nemo_gym/run_grpo_nemo_gym.py \
>     --config=$CONFIG_PATH \
>     ++logger.wandb.project="${Your Username}-nemo-gym-rl-integration" \
>     ++logger.wandb.name=$EXP_NAME \
>     ++logger.log_dir=results/$EXP_NAME \
>     ++policy.generation.vllm_cfg.tool_parser_plugin=$(find $PWD/.cache -name nemotron_toolcall_parser_no_streaming.py) \
>     ++grpo.max_num_steps=3 \
>     ++checkpointing.checkpoint_dir=results/$EXP_NAME \
>     cluster.gpus_per_node=1 \
>     policy.megatron_cfg.tensor_model_parallel_size=1 \
>     &> results/$EXP_NAME/output.log &

Key differences:

Added CUDA_VISIBLE_DEVICES=0 to use only GPU 0
Set cluster.gpus_per_node=1
Set policy.megatron_cfg.tensor_model_parallel_size=1

The end of the command above does the following:

$ &> results/$EXP_NAME/output.log &

&> results/$EXP_NAME/output.log: Pipes the terminal outputs into a file at results/$EXP_NAME/output.log that you can view.
&: This final ampersand runs the job in the background, which frees up your terminal to do other things. You can view all the background jobs using the jobs command. If you need to quit the training run, you can use the fg command to bring the job from the background into the foreground and then Ctrl+C like normal.

✅ Success Check: Training completes 3 steps on single node without any issues. Check the logs for errors and verify that training steps are progressing.

Next Steps

Your single-node run validated the environment. Scale to multiple nodes for production training:

Continue to Multi-Node Training →