Nemotron 3 Nano Omni#
This guide explains how to post-train the Nemotron 3 Nano Omni vision-language model with GRPO using NeMo RL on the AutoModel backend.
It covers two recipes:
CLEVR-CoGenT — runs on a single 8-GPU node (interactive container).
MMPR-Tiny — runs on 4 nodes via Slurm.
Both share the same checkpoint, model code, and reward pipeline; they differ only in the dataset, reward functions, and node count.
Recipe 1 — CLEVR-CoGenT (single-node)#
The CLEVR-CoGenT recipe uses examples/configs/recipes/vlm/vlm_grpo-nemotron-omni-30ba3b-clevr-1n8g-automodel-ep8.v1.yaml. It expects 8 GPUs on a single node, EP=8 across the experts, and TP=8 in vLLM.
Key knobs in the config:
Field |
Value |
|---|---|
|
path to the Nemotron-Omni HF checkpoint |
|
8 |
|
8 |
|
8192 |
|
|
|
|
|
|
CLEVR is loaded automatically from HuggingFace by the clevr-cogent response dataset on first run; no manual prep is required.
Launch (interactive container)#
From inside the container on an 8-GPU node:
export NRL_MAMBA_PREFILL_DECODE_SYNC="${NRL_MAMBA_PREFILL_DECODE_SYNC:-1}"
uv run examples/run_vlm_grpo.py --config examples/configs/recipes/vlm/vlm_grpo-nemotron-omni-30ba3b-clevr-1n8g-automodel-ep8.v1.yaml \
cluster.gpus_per_node=8 \
cluster.num_nodes=1
To override the model path or any other YAML field, append Hydra-style overrides:
uv run examples/run_vlm_grpo.py --config examples/configs/recipes/vlm/vlm_grpo-nemotron-omni-30ba3b-clevr-1n8g-automodel-ep8.v1.yaml \
policy.model_name=/path/to/your/checkpoint \
cluster.gpus_per_node=8 cluster.num_nodes=1
Recipe 2 — MMPR-Tiny (4-node Slurm)#
The MMPR-Tiny recipe uses examples/configs/recipes/vlm/vlm_grpo-nemotron-omni-30ba3b-mmpr-4n8g-automodel-ep8.v1.yaml. Differences vs. the CLEVR recipe:
Field |
Value |
|---|---|
|
|
|
local cache dir for MMPR-Tiny (loader auto-downloads from HF) |
|
|
|
8192 |
|
|
Launch (4-node Slurm)#
Submit with ray.sub. From the repo root on a Slurm login node:
# --- Cluster config ---
export SBATCH_ACCOUNT=your_slurm_account
export SBATCH_PARTITION=batch
export SBATCH_TIME=4:00:00
export CONTAINER=/path/to/containers/nemo-rl-nano-v3-vl-<tag>.sqsh
export MOUNTS=/lustre:/lustre
export HF_HOME=/path/to/cache/huggingface
export TMPDIR=/tmp/nrl-${USER}
export NCCL_DEBUG=WARN
export NRL_IGNORE_VERSION_MISMATCH=1
# --- Run config ---
NUM_NODES=4
GPUS_PER_NODE=8
JOB_NAME=grpo-nemotron-omni-mmpr
RESULTS_DIR=$PWD/results/${JOB_NAME}
CONFIG_PATH=examples/configs/recipes/vlm/vlm_grpo-nemotron-omni-30ba3b-mmpr-4n8g-automodel-ep8.v1.yaml
# --- Build the training command (run inside the container on every node) ---
export COMMAND="\
export PYTHONPATH=\${PYTHONPATH:-}:/path/to/automodel-omni && \
export CUDA_LAUNCH_BLOCKING=0 && \
export TORCH_USE_CUDA_DSA=0 && \
export NRL_MAMBA_PREFILL_DECODE_SYNC=1 && \
mkdir -p ${HF_HOME} ${TMPDIR} ${RESULTS_DIR} && \
uv run examples/run_vlm_grpo.py --config ${CONFIG_PATH} \
cluster.num_nodes=${NUM_NODES} \
cluster.gpus_per_node=${GPUS_PER_NODE} \
checkpointing.checkpoint_dir='${RESULTS_DIR}' \
logger.wandb.name='${JOB_NAME}'"
# --- Submit ---
sbatch \
--nodes=${NUM_NODES} \
--account=${SBATCH_ACCOUNT} \
--job-name=nemo-rl-${JOB_NAME} \
--partition=${SBATCH_PARTITION} \
--time=${SBATCH_TIME} \
--dependency=singleton \
--gres=gpu:${GPUS_PER_NODE} \
ray.sub
To run on a different node count, change NUM_NODES and the --nodes flag.