> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.

# Training Tutorials

We have hands-on tutorials with supported training frameworks to help you train with NeMo Gym environments. If you're interested in integrating another training framework, see the [Training Framework Integration Guide](/latest/contribute/rl-framework-integration).

<Tip>
  See [Training](/latest/about/concepts/training) for a refresher on when to use GRPO, SFT, or DPO.
</Tip>

## RL (GRPO)

<Cards>
  <Card title="NeMo RL" href="/latest/training-tutorials/nemo-rl-grpo">
    Tutorial-series: GRPO training to improve multi-step tool calling on the Workplace Assistant environment, scaling from single-node to multi-node training.

    <Badge minimal outlined>
      nemo rl
    </Badge>

    <Badge minimal outlined>
      grpo
    </Badge>

    <Badge minimal outlined>
      3-5 hours
    </Badge>
  </Card>

  <Card title="OpenRLHF" href="https://github.com/OpenRLHF/OpenRLHF/blob/main/examples/python/agent_func_nemogym_executor.py">
    Review the agent executor for using NeMo Gym environments with OpenRLHF.

    <Badge minimal outlined>
      openrlhf
    </Badge>
  </Card>

  <Card title="Unsloth" href="/latest/training-tutorials/unsloth">
    Example GRPO training on instruction following and reasoning environments.

    <Badge minimal outlined>
      unsloth
    </Badge>

    <Badge minimal outlined>
      single-gpu
    </Badge>

    <Badge minimal outlined>
      30 min
    </Badge>
  </Card>

  <Card title="VeRL" href="/latest/training-tutorials/verl">
    Example DAPO training on math and agentic environments using VeRL, with single and multi-environment support.

    <Badge minimal outlined>
      verl
    </Badge>

    <Badge minimal outlined>
      dapo
    </Badge>

    <Badge minimal outlined>
      multi-node
    </Badge>

    <Badge minimal outlined>
      1 hour
    </Badge>
  </Card>
</Cards>

### Multi-Environment Training

<Cards>
  <Card title="Multi-Environment Training" href="/latest/training-tutorials/multi-environment-training">
    Run multiple training environments simultaneously for rollout collection.

    <Badge minimal outlined>
      multi-environment
    </Badge>

    <Badge minimal outlined>
      multi-verifier
    </Badge>
  </Card>
</Cards>

## SFT & DPO

<Cards>
  <Card title="Offline Training with Rollouts" href="/latest/training-tutorials/offline-training-w-rollouts">
    Transform rollouts into training data for supervised fine-tuning (SFT) and direct preference optimization (DPO).

    <Badge minimal outlined>
      sft
    </Badge>

    <Badge minimal outlined>
      dpo
    </Badge>
  </Card>
</Cards>