Setup
Now that you understand the configuration parameters for GRPO training, it’s time to set up your environment. This involves launching containers, installing dependencies, and preparing your training data—the foundation for everything that follows.
Goal: Set up your environment for GRPO training with NeMo RL and NeMo Gym.
Time: ~30 minutes
In this section, you will:
- Launch an interactive GPU session
- Clone and install NeMo RL and NeMo Gym
- Run sanity tests to validate the setup
- Prepare the Workplace Assistant dataset
Prerequisites
Make sure you have:
- ✅ Access to a Slurm cluster with GPU nodes
- ✅ A shared filesystem accessible from all nodes
- ✅ HuggingFace token for downloading models
1. Enter a GPU Node
Estimated time: ~5 minutes
Launch an interactive Slurm session to run training commands. Refer to the NeMo RL Cluster Setup documentation for more details.
If this is your first time downloading this Docker image, the srun command below will take 5-10 minutes.
If you are using enroot as a containerization framework, you can pull the container after defining $CONTAINER_IMAGE_PATH:
✅ Success Check: You should be inside the container with a bash prompt.
2. Clone and Setup NeMo RL + NeMo Gym
Estimated time: ~5-10 minutes
For the first setup on your local filesystem:
✅ Success Check: No errors during installation and uv sync completes successfully.
3. Run Sanity Tests
Estimated time: ~5-10 minutes
Download the model used in the following tests:
Validate your setup before training:
The script runs a targeted set of tests that verify the full stack required for training with NeMo RL and NeMo Gym:
- vLLM generation — Confirms that the vLLM backend can generate text and serve an OpenAI-compatible HTTP endpoint, which NeMo Gym uses for model inference.
- Token retokenization — Tests edge cases in converting between OpenAI schema (text) and token IDs.
- Environment step — Runs a basic NeMo RL environment step to validate that the environment interface works independently of NeMo Gym.
- NeMo Gym integration — Verifies that NeMo Gym correctly integrates into NeMo RL as an Environment.
- End-to-end rollout — Exercises the rollout loop that NeMo Gym uses inside
grpo_train, confirming that rollout collection works end to end.
✅ Success Check: All tests pass without errors.
You can clean up any existing or leftover Ray/vLLM processes using the following commands:
4. Prepare NeMo Gym Data
Estimated time: ~5 minutes
The Workplace Assistant dataset must be downloaded from HuggingFace and prepared for training. This runs ng_prepare_data to download and validate the dataset, and to add an agent_ref property to each example that tells NeMo Gym which agent server should handle that example.
Clone and setup the Gym Python environment:
Add your HuggingFace token to download Gym datasets from HuggingFace. This command will store your HF token in a file that is excluded from Git, so it will never be committed or pushed:
Prepare the data:
Return to the NeMo RL Python environment and directory:
✅ Success Check: Dataset files are created in 3rdparty/Gym-workspace/Gym/data/workplace_assistant/.
Next Steps
With your environment set up and data prepared, run your first training session:
Continue to Single Node Training →