> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.

# Single Node Training

With your environment set up and data prepared, you're ready to run training. But before committing to a multi-hour, multi-node job, it's important to verify everything works correctly on a single node first.

<Info>
  **Goal**: Run a single-node GRPO training session to validate your environment.

  **Time**: \~45 minutes

  **In this section, you will**:

  1. Download the Nemotron Nano 9B v2 model
  2. Configure the model's chat template
  3. Clean up existing processes
  4. Run a test training session with 3 steps
</Info>

<NavButton href="/v0.2/training-tutorials/nemo-rl-grpo/setup" label="Previous: Setup" direction="prev" />

***

## Prerequisites

Make sure you have:

* ✅ Completed the [Setup](/v0.2/training-tutorials/nemo-rl-grpo/setup) instructions
* ✅ Access to a running container session with GPUs
* ✅ (Optional) Weights & Biases API key for experiment tracking

***

## 0. Return to NeMo RL directory

Since we are running RL training, the following steps will all be run in the NeMo RL root directory, rather than NeMo Gym directory.

**✅ Success Check**: You should see a yaml file at `examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml` and a Python file at `examples/nemo_gym/run_grpo_nemo_gym.py`.

## 1. Download the Model

**Estimated time**: \~5-10 minutes

Download NVIDIA [Nemotron Nano 9B v2](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2):

```bash
HF_HOME=$PWD/.cache/ \
HF_TOKEN={your HF token} \
    hf download nvidia/NVIDIA-Nemotron-Nano-9B-v2
```

**✅ Success Check**: Model files are downloaded to `.cache/hub/models--nvidia--NVIDIA-Nemotron-Nano-9B-v2/`.

***

## 2. Configure the Chat Template

**Estimated time**: \~1 minute

The Nemotron Nano 9B v2 model uses a custom chat template that must be modified for RL training. This step modifies the cached version of the chat template:

```bash
tokenizer_config_path=$(find $PWD/.cache/hub/models--nvidia--NVIDIA-Nemotron-Nano-9B-v2 -name tokenizer_config.json)
sed -i 's/enable_thinking=true/enable_thinking=false/g' $tokenizer_config_path
sed -i 's/{%- if messages\[-1\]\['\''role'\''\] == '\''assistant'\'' -%}{%- set ns.last_turn_assistant_content = messages\[-1\]\['\''content'\''\].strip() -%}{%- set messages = messages\[:-1\] -%}{%- endif -%}//g' $tokenizer_config_path
```

**✅ Success Check**: The `sed` commands complete without errors.

***

## 3. Run Training

**Estimated time**: \~15-30 minutes

By default, this runs only 3 training steps (`grpo.max_num_steps=3`) as a small test run in preparation for multi-node training. If you are using a single node for the full training run, you can remove this value. The full training will take several hours.

```bash
# Set experiment name with timestamp
EXP_NAME="$(date +%Y%m%d)/nemo_gym_grpo/nemotron_nano_v2_9b/workplace_assistant_001"
mkdir -p results/$EXP_NAME

# Configuration file path
CONFIG_PATH=examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml

# Launch training
# Set these environment variables before running:
#   WANDB_API_KEY: Your Weights & Biases API key for logging
#   logger.wandb.project: Fill in your username
TORCH_CUDA_ARCH_LIST="9.0 10.0" \
HF_HOME=$PWD/.cache/ \
WANDB_API_KEY={your W&B API key} \
uv run python examples/nemo_gym/run_grpo_nemo_gym.py \
    --config=$CONFIG_PATH \
    ++logger.wandb.project="${Your Username}-nemo-gym-rl-integration" \
    ++logger.wandb.name=$EXP_NAME \
    ++logger.log_dir=results/$EXP_NAME \
    ++policy.generation.vllm_cfg.tool_parser_plugin=$(find $PWD/.cache -name nemotron_toolcall_parser_no_streaming.py) \
    ++grpo.max_num_steps=3 \
    ++checkpointing.checkpoint_dir=results/$EXP_NAME &> results/$EXP_NAME/output.log &

# Watch the logs
tail -f results/$EXP_NAME/output.log
```

<Note>
  **Single GPU Training**: If you only have 1 GPU available, use these modifications:

  ```bash
  CUDA_VISIBLE_DEVICES=0 \
  TORCH_CUDA_ARCH_LIST="9.0 10.0" \
  HF_HOME=$PWD/.cache/ \
  WANDB_API_KEY={your W&B API key} \
  uv run python examples/nemo_gym/run_grpo_nemo_gym.py \
      --config=$CONFIG_PATH \
      ++logger.wandb.project="${Your Username}-nemo-gym-rl-integration" \
      ++logger.wandb.name=$EXP_NAME \
      ++logger.log_dir=results/$EXP_NAME \
      ++policy.generation.vllm_cfg.tool_parser_plugin=$(find $PWD/.cache -name nemotron_toolcall_parser_no_streaming.py) \
      ++grpo.max_num_steps=3 \
      ++checkpointing.checkpoint_dir=results/$EXP_NAME \
      cluster.gpus_per_node=1 \
      policy.megatron_cfg.tensor_model_parallel_size=1 \
      &> results/$EXP_NAME/output.log &
  ```

  **Key differences:**

  * Added `CUDA_VISIBLE_DEVICES=0` to use only GPU 0
  * Set `cluster.gpus_per_node=1`
  * Set `policy.megatron_cfg.tensor_model_parallel_size=1`
</Note>

<Tip>
  The end of the command above does the following:

  ```bash
  &> results/$EXP_NAME/output.log &
  ```

  1. `&> results/$EXP_NAME/output.log`: Pipes the terminal outputs into a file at `results/$EXP_NAME/output.log` that you can view.
  2. `&`: This final ampersand runs the job in the background, which frees up your terminal to do other things. You can view all the background jobs using the `jobs` command. If you need to quit the training run, you can use the `fg` command to bring the job from the background into the foreground and then Ctrl+C like normal.
</Tip>

**✅ Success Check**: Training completes 3 steps on single node without any issues. Check the logs for errors and verify that training steps are progressing.

***

## Next Steps

Your single-node run validated the environment. Scale to multiple nodes for production training:

<NavButton href="/v0.2/training-tutorials/nemo-rl-grpo/multi-node-training" label="Continue to Multi-Node Training" direction="next" />