Training Tutorials

We have hands-on tutorials with supported training frameworks to help you train with NeMo Gym environments. If you’re interested in integrating another training framework, see the Training Framework Integration Guide.

See Training for a refresher on when to use GRPO, SFT, or DPO.

RL (GRPO)

NeMo RL

Tutorial-series: GRPO training to improve multi-step tool calling on the Workplace Assistant environment, scaling from single-node to multi-node training.

nemo rlgrpo3-5 hours

Unsloth

Example GRPO training on instruction following and reasoning environments.

unslothsingle-gpu30 min

VeRL

Example DAPO training on math and agentic environments using VeRL, with single and multi-environment support.

verldapomulti-node1 hour

Multi-Environment Training

Run multiple training environments simultaneously for rollout collection.

multi-environmentmulti-verifier

SFT & DPO

Offline Training with Rollouts

Transform rollouts into training data for supervised fine-tuning (SFT) and direct preference optimization (DPO).

sftdpo