This page provides quick answers to commonly asked questions. Over time, these topics will be integrated into the structured product documentation (tutorials, guides, and reference sections) as we expand coverage. We’ve documented them here to provide immediate help while more comprehensive documentation is in progress.
The huggingface client requires that your credentials are in env.yaml, along with some other pertinent details needed to upload to the designated place.
Security: The env.yaml file contains sensitive credentials. Ensure it is listed in your .gitignore to prevent accidental commits. Never commit API tokens to version control.
Naming convention for Huggingface datasets is as follows.
{hf_organization}/{hf_dataset_prefix}-{domain}–{resources_server OR dataset_name}
E.g.:
nvidia/Nemotron-RL-math-OpenMathReasoning
You will only need to manually input the {dataset_name} portion of the above when inputting the dataset_name flag in the upload command (refer to the command below). Everything preceding it will be automatically populated using your config prior to upload. Note that it is optional, and overrides resources_server if used.
To upload to Huggingface, use the below command:
Because of the required dataset nomenclature, the resources server config path is required when uploading. Specifically, domain is used in the naming of a dataset in Huggingface.
By default, the split parameter for uploading is set to train, which will run a check on the required fields {"responses_create_params"}. Specifying validation or test bypasses this check:
When uploading to an organization repository where you don’t have direct write access (e.g., nvidia/), use the +create_pr=true flag to create a Pull Request instead of pushing directly. You can also customize the commit message and description.
If you want to specify the revision (branch name), you can add the +revision={your branch name} flag. Excluding create_pr (or setting it to false) assumes you are committing to an existing branch. Including it assumes it will be a brand new branch.
The command will output a link to the created Pull Request:
The commit_message and commit_description parameters work for both direct pushes and Pull Requests. If not provided, HuggingFace auto-generates a commit message based on the filename.
You can optionally pass a +delete_from_gitlab=true flag to the above command, which will delete the model and all of its artifacts from Gitlab. By default, this is set to False.
There will be a confirmation dialog to confirm the deletion:
You can also run the below command which does the same thing without the need for a +delete_from_gitlab flag:
If you’ve already uploaded to Huggingface and just want to do a standalone delete from Gitlab:
Gitlab model names are case sensitive. There can be models named ‘My_Model’ and ‘my_model’ living simultaneously in the registry. When uploading to Huggingface with the intention of deleting Gitlab artifacts, be sure the casing of your Huggingface dataset name matches that of Gitlab’s.
Downloading a dataset from Huggingface is straightforward:
For structured datasets (with train/validation/test splits):
The split parameter is optional. If omitted, all available splits will be downloaded as separate JSONL files.
For raw file repositories (with specific JSONL files):
Use artifact_fpath when the HuggingFace repo contains raw/arbitrary JSONL files rather than structured dataset splits. You cannot specify both split and artifact_fpath.
When you use ng_init_resources_server +entrypoint=resources_servers/example_multi_step to initialize a resources server, you will get a config.yaml that looks like the below code block. The dataset information for training, validation, and example will be inside the scope of your agent config (e.g. under simple_agent) and is a list of dataset objects.
A dataset object consists of:
repo_id (required) and optionally artifact_fpath for raw file repos. If artifact_fpath is omitted, the datasets library will infer the split from the dataset type.Each config.yaml in the resources server requires at least one agent with one example dataset. This example dataset is the first 5 rows of your train dataset that is used for sanity checks on the format for your dataset and the format of each individual example and for others to quickly understand your data.
For every PR that contributes data, we require common dataset statistics and sanity checks on the data itself. This process is also helpful to catch any simple issues before you ever train with NeMo RL. NeMo Gym provides a helper command ng_prepare_data to do so.
To download missing datasets automatically, add +should_download=true. By default, datasets are downloaded from HuggingFace:
For NVIDIA internal users, you can download from GitLab instead:
Run NeMo Gym servers the exact same way with the same configs!
The ng_prepare_data command will:
The ng_prepare_data command has 2 modes, one for actual train and validation set preparation, and one for example validation intended to sanity check your data format. You would typically run +mode=example_validation when first contributing a resources server, and then run with +mode=train_preparation when you actually go to train.
For large scale verifier training, it’s critical that your resources server is as efficient as possible. It can be slammed with 16k concurrent requests or more. NeMo Gym provides easy tools to profile and understand the efficiency of your servers.
In one terminal, start your agent, model, and resources servers, with profiling enabled.
profiling_enabled (bool): whether profiling is enabled or not. By default this is disabled since it incurs some slight overhead we don’t want at runtime.profiling_results_dirpath (str): The directory to save all server profiling results in. Previous logs for the same will be overwritten in the same directory.In another terminal, run some large number of rollouts against your servers. Use the limit and num_repeats flags to adjust the number of samples you want to run.
After ng_collect_rollouts finishes, ctrl+c to quit your servers. You should see some output in the terminal like the following:
The log file content for a server will look something like the following:
ncall: number of calls (how many times the function/subroutine was invoked).
LibraryJudgeMathResourcesServer.verify function was invoked 1024 times.tsub: time spent inside the subroutine itself, excluding calls to other functions (sometimes called “self time”).
LibraryJudgeMathResourcesServer.verify function itself accounted for only 0.009755s of time.ttot: total time spent in the subroutine, including all the functions it called.
LibraryJudgeMathResourcesServer.verify function and all functions it called including _verify_answer, etc accounted for a total of 17.98387s.tavg: average time per call (often ttot / ncall).
LibraryJudgeMathResourcesServer.verify function took 0.017562s per call on average.NeMo Gym automatically sets up Ray for distributed computing for CPU-intensive tasks.
Ray is initialized when you start NeMo Gym servers:
The initialization happens in two places:
cli.py): Ray is initialized in the main process when RunHelper.start() is calledserver_utils.py): Each server invokes initialize_ray() during its startup and connects to the same Ray cluster initialized by the main process.You can also specify a custom Ray cluster address in your config:
Training frameworks like Nemo-RL will configure the Ray head node address, allowing remote tasks to run across all nodes in the cluster.
If not specified, NeMo Gym will start a local Ray cluster and store the address in the global config for child processes to connect to.
Here’s how to parallelize CPU-intensive functions using Ray’s @ray.remote decorator. Refer to Ray documentation for more options.
Reading time: 5 mins Date: Fri Aug 15, 2025
SFT (supervised fine tuning) and RL (reinforcement learning) are two different ways of optimizing your model for different tasks and each have their own use cases.
Let’s say you wanted to train your model to be really good at math.
One way I like to think about these things is:
Tying back to NeMo Gym, NeMo Gym can be used to create synthetic data for SFT training by running strong teacher models on the different environments. Critically, it will also be used as the source of data during RL training.
If you see an error like the following when running ng_run:
What’s happening:
Ray (NeMo Gym’s distributed computing dependency) uses psutil to enumerate and monitor processes, which requires calling system calls like sysctl() on macOS. In sandboxed execution environments, these system calls are blocked for security reasons.
Who is affected:
This is an edge case that only affects users in restricted environments:
Normal users running in their own terminal will NOT encounter this - they have full system permissions by default.
Solution (if you’re affected):
If you’re running NeMo Gym in a sandboxed environment and hit this error, you’ll need to either:
For most development and production use cases, simply running ng_run in your regular terminal will work without any issues.
Specific workaround for Cursor AI:
If you’re using Cursor’s AI assistant to run NeMo Gym commands and encounter this error, the AI will need to run commands with elevated permissions. This is not something you configure - the AI assistant will automatically request additional permissions (specifically required_permissions: ["all"]) when it detects this error. If you see the error persist, try asking the AI to restart the servers or run the command in your own terminal instead.
Why NeMo Gym can’t fix this:
This is a fundamental incompatibility between:
There’s no practical workaround that NeMo Gym can implement - the solution is to run with appropriate permissions for your environment.
If the build-docs CI check fails, preview the Fern docs locally by running fern docs dev from the repository root. Common causes include broken markdown links, invalid MDX syntax, or missing frontmatter in new pages.
Monotonicity means the token sequence in a multi-step rollout only grows, so previous tokens are never modified or dropped between turns. NeMo Gym and NeMo RL currently require this property for training.
NeMo RL enforces monotonicity in two places:
Examples:
For models with a chat template that drops previous reasoning traces: modify the chat template to retain all thinking, or use the non-thinking model.
For agents with non-monotonic trajectoires, the asserts may need to be disabled. This is not currently supported, but can be experimented with.
inference.nvidia.com uses LiteLLM caching by default which leads to no diversity in model responses (pass@1 similar to pass@5). Please set something like the following flags in order to enable diverse responses: