Skip to main content
Ctrl+K
NeMo-RL - Home NeMo-RL - Home

NeMo-RL

  • GitHub
NeMo-RL - Home NeMo-RL - Home

NeMo-RL

  • GitHub

Table of Contents

About

  • Overview
  • Performance
  • Model Support
  • Features and Roadmap
  • Training and Generation Backends
  • Quick Start
  • Installation and Prerequisites
  • Algorithms
    • GRPO
    • DAPO
    • On-policy Distillation
    • Supervised Fine-Tuning (SFT)
    • DPO
    • RM
  • Evaluation
  • Installation: Set Up Clusters
  • Tips and Tricks

Environment Start

  • Run on Your Local Workstation
  • Set Up Clusters

E2E Examples

  • SFT on OpenMathInstruct-2

Guides

  • Nemotron 3 Nano
  • Add New Models
  • Supervised Fine-Tuning in NeMo RL
  • Direct Preference Optimization in NeMo RL
  • An in-depth Walkthrough of DAPO in NeMo RL
  • An In-Depth Walkthrough of ProRLv2 in NeMo RL
  • An In-depth Walkthrough of GRPO in NeMo RL
  • GRPO on DeepScaler
  • Solve a Sliding Puzzle Using GRPO
  • Reward Model Training in NeMo RL
  • Environments for GRPO Training
  • Evaluation
  • DeepSeek-V3
  • Model Quirks
  • Train with Async GRPO
  • Train with Eagle3 Speculative Decoding
  • Muon Optimizer
  • DTensor Tensor Parallel Accuracy Issue
  • Fault Tolerance Launcher Guide

Containers

  • Build Docker Images

Development

  • Test NeMo RL
  • Documentation Development
  • Debug NeMo RL Applications
  • Profile GPU with Nsys
  • FP8 Quantization in NeMo RL
  • Experiment with Custom vLLM

Design Docs

  • Design and Philosophy
  • Padding in NeMo RL
  • Logger
  • uv in NeMo RL
  • Dependency Management
  • Data Format
  • Generation Interface
  • Exporting Checkpoints to Hugging Face Format
  • Loss functions in NeMo RL
  • FSDP2 Parallel Plan
  • Training Backends
  • Sequence Packing and Dynamic Batching
  • Environment Variable Precedence in NeMo RL
  • NeMo Gym Integration

API Reference

  • API Reference
    • nemo_rl
      • nemo_rl.data
        • nemo_rl.data.packing
        • nemo_rl.data.datasets
        • nemo_rl.data.dataloader
        • nemo_rl.data.utils
        • nemo_rl.data.interfaces
        • nemo_rl.data.multimodal_utils
        • nemo_rl.data.processors
        • nemo_rl.data.collate_fn
        • nemo_rl.data.chat_templates
        • nemo_rl.data.llm_message_utils
      • nemo_rl.algorithms
        • nemo_rl.algorithms.loss
        • nemo_rl.algorithms.sft
        • nemo_rl.algorithms.utils
        • nemo_rl.algorithms.advantage_estimator
        • nemo_rl.algorithms.grpo
        • nemo_rl.algorithms.distillation
        • nemo_rl.algorithms.async_utils
        • nemo_rl.algorithms.logits_sampling_utils
        • nemo_rl.algorithms.rm
        • nemo_rl.algorithms.reward_functions
        • nemo_rl.algorithms.dpo
      • nemo_rl.models
        • nemo_rl.models.huggingface
        • nemo_rl.models.dtensor
        • nemo_rl.models.megatron
        • nemo_rl.models.policy
        • nemo_rl.models.automodel
        • nemo_rl.models.generation
      • nemo_rl.distributed
        • nemo_rl.distributed.worker_groups
        • nemo_rl.distributed.virtual_cluster
        • nemo_rl.distributed.ray_actor_environment_registry
        • nemo_rl.distributed.batched_data_dict
        • nemo_rl.distributed.model_utils
        • nemo_rl.distributed.named_sharding
        • nemo_rl.distributed.worker_group_utils
        • nemo_rl.distributed.stateless_process_group
        • nemo_rl.distributed.collectives
      • nemo_rl.evals
        • nemo_rl.evals.eval
        • nemo_rl.evals.answer_parsing
      • nemo_rl.utils
        • nemo_rl.utils.prefetch_venvs
        • nemo_rl.utils.packed_tensor
        • nemo_rl.utils.logger
        • nemo_rl.utils.venvs
        • nemo_rl.utils.nsys
        • nemo_rl.utils.memory_tracker
        • nemo_rl.utils.flops_tracker
        • nemo_rl.utils.flops_formulas
        • nemo_rl.utils.native_checkpoint
        • nemo_rl.utils.nvml
        • nemo_rl.utils.config
        • nemo_rl.utils.timer
        • nemo_rl.utils.checkpoint
      • nemo_rl.experience
        • nemo_rl.experience.rollouts
      • nemo_rl.environments
        • nemo_rl.environments.metrics
        • nemo_rl.environments.utils
        • nemo_rl.environments.code_environment
        • nemo_rl.environments.nemo_gym
        • nemo_rl.environments.code_jaccard_environment
        • nemo_rl.environments.dapo_math_verifier
        • nemo_rl.environments.interfaces
        • nemo_rl.environments.vlm_environment
        • nemo_rl.environments.math_environment
        • nemo_rl.environments.rewards
        • nemo_rl.environments.reward_model_environment
      • nemo_rl.package_info
  • API Reference
  • nemo_rl
  • nemo_rl.data
  • nemo_rl.data.datasets
  • nemo_rl.data.datasets.raw_dataset

nemo_rl.data.datasets.raw_dataset#

Module Contents#

Classes#

RawDataset

API#

class nemo_rl.data.datasets.raw_dataset.RawDataset#
data_config: nemo_rl.data.ResponseDatasetConfig | nemo_rl.data.PreferenceDatasetConfig#

None

dataset: datasets.Dataset#

None

val_dataset: datasets.Dataset | None#

None

processor: nemo_rl.data.interfaces.TaskDataProcessFnCallable#

None

task_spec: nemo_rl.data.interfaces.TaskDataSpec#

None

preprocessor: nemo_rl.data.interfaces.TaskDataPreProcessFnCallable | None#

None

split_train_validation(test_size: float, seed: int)#
set_processor()#
set_task_spec(
data_config: nemo_rl.data.ResponseDatasetConfig | nemo_rl.data.PreferenceDatasetConfig,
)#

previous

nemo_rl.data.datasets.processed_dataset

next

nemo_rl.data.dataloader

On this page
  • Module Contents
    • Classes
    • API
      • RawDataset
        • data_config
        • dataset
        • val_dataset
        • processor
        • task_spec
        • preprocessor
        • split_train_validation()
        • set_processor()
        • set_task_spec()
NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.