SWE-RL (Stages 2.1β2.2)#
End-to-end RL for software engineering tasks. SWE-RL is handled as a separate stage because its rollouts take substantially longer and require much longer context lengths, making it a throughput bottleneck when trained alongside the other RLVR environments.
SWE Container#
Both SWE stages require pre-fetched Python virtual environments that are not included in the base nemo-rl:v0.5.0.nemotron_3_super image. Build the SWE container once (from within the NeMo-RL repo):
docker buildx build \
-t your-registry/nemo-rl:v0.5.0.nemotron_3_super_swe \
--push \
-f- . <<'EOF'
FROM nvcr.io/nvidia/nemo-rl:v0.5.0.nemotron_3_super
RUN <<'RUNEOF'
set -euxo pipefail
UV_TORCH_BACKEND=$(uv run python -c "import tomllib,pathlib; \
indexes=tomllib.loads(pathlib.Path('pyproject.toml').read_text())['tool']['uv']['index']; \
print(next(i['name'].removeprefix('pytorch-') for i in indexes if i['name'].startswith('pytorch-')))") \
UV_LINK_MODE=hardlink uv run python examples/nemo_gym/prefetch_venvs.py \
examples/configs/super/stage2_swe1.yaml \
examples/configs/super/stage2_swe2.yaml
RUNEOF
EOF
Set the container image in your config or via override:
uv run nemotron super3 rl swe1 --run YOUR-CLUSTER \
run.env.container=your-registry/nemo-rl:v0.5.0.nemotron_3_super_swe
Stage 2.1 β SWE 1 (64 nodes)#
SWE-pivot training using a single-step tool use comparison approach. The model receives a code problem and must produce a solution evaluated against ground truth.
Configuration#
Parameter |
Value |
|---|---|
Nodes |
64 (512 GPUs) |
Generation nodes |
32 (colocated=false) |
Prompts/step |
64 |
Generations/prompt |
16 |
Batch size |
1,024 |
Max sequence length |
131,072 |
TP / CP |
8 / 8 |
Learning rate |
1e-6 |
KL penalty |
0 |
Overlong filtering |
true |
Prefix caching |
enabled |
Config Files#
stage2_swe1/config/default.yamlβ Full-scale 64-node configstage2_swe1/config/small.yamlβ Reduced 8-node variant for testing
Using nemotron CLI#
uv run nemotron super3 rl swe1 --run YOUR-CLUSTER
--run YOUR-CLUSTERrefers to a profile defined in yourenv.tomlfile. See the env.toml setup guide for details.SWE stages require the SWE container with pre-fetched venvs.
Using super_launch.sh#
EXP_NAME=stage2.1-swe1 \
CONFIG_PATH=examples/configs/super/stage2_swe1.yaml \
MODEL_PATH=/path/to/rlvr3_checkpoint \
TRAIN_PATH=$DATA_DIR/swe1/train-split.jsonl \
VAL_PATH=$DATA_DIR/swe1/val-split.jsonl \
CONTAINER=$SWE_CONTAINER \
SANDBOX_CONTAINER=$SANDBOX_CONTAINER \
PERSISTENT_CACHE=$PERSISTENT_CACHE \
EXTRA_MOUNTS=$EXTRA_MOUNTS \
SLURM_PARTITION=$SLURM_PARTITION \
SLURM_ACCOUNT=$SLURM_ACCOUNT \
bash super_launch.sh
Stage 2.2 β SWE 2 (64 nodes)#
Full SWE-bench training with container-isolated sandboxes. Each rollout launches an isolated container with the target repository, runs the OpenHands agent loop to produce a code patch, and runs ground-truth tests to compute a binary reward.
Configuration#
Parameter |
Value |
|---|---|
Nodes |
64 (512 GPUs) |
Generation nodes |
32 (colocated=false) |
Prompts/step |
16 |
Generations/prompt |
32 |
Batch size |
512 |
Max sequence length |
196,608 |
TP / CP |
8 / 8 |
Learning rate |
1e-6 |
KL penalty |
0 |
Overlong filtering |
true |
Agent max turns |
200 |
Agent concurrency |
768 |
Agent timeout |
3,600s |
Thinking mode |
enabled |
Config Files#
stage2_swe2/config/default.yamlβ Full-scale 64-node config
Infrastructure#
Container Isolation#
Each SWE task instance needs an isolated environment with its own filesystem to execute code and run tests. The upstream training uses Apptainer (formerly Singularity) because SLURM HPC clusters typically lack root access, which rules out Docker. Apptainer runs pre-built .sif images with a writable tmpfs overlay while sharing the host kernel.
Other environments: If you have root access or are running outside SLURM, Docker or Podman can provide the same container isolation with stronger process and memory boundaries (cgroup isolation). The SWE-bench environment images are available as standard Docker images from R2E-Gym, SWE-Gym, and SWE-Bench Verified on HuggingFace β they only need to be converted to
.sifformat for Apptainer.
Because Apptainer shares the host kernel and memory space, additional safeguards are needed:
Component |
Description |
|---|---|
Memory Watchdog |
Monitors aggregate RSS of tmux process trees and proactively kills runaway processes, since Apptainer containers share host memory (unlike Dockerβs cgroup isolation). |
Command Blocklist |
Regex-based blocklist intercepts dangerous commands ( |
These safeguards are less critical when using Docker or Podman, which provide cgroup-based process and memory isolation by default.
Agent Loop#
Component |
Description |
|---|---|
OpenHands Agent Loop |
Manages the full lifecycle: initializing runtime, presenting problems, running agent steps (up to 200 turns), extracting git patches, and running tests for binary reward. |
Harness Diversity |
OpenCode and Codex agent classes within OpenHands match external harness tool formats (Claude Code, Codex CLI) for training diversity. |
Prerequisites#
For SLURM clusters without root access, install Apptainer and download SIF images:
# Install Apptainer (Ubuntu/Debian)
wget https://github.com/apptainer/apptainer/releases/download/v1.3.1/apptainer_1.3.1_amd64.deb
sudo apt install -y ./apptainer_1.3.1_amd64.deb
# Download and convert Docker images to .sif files
./examples/nemo_gym/download_swe_images.py --sif-dir /path/to/sif --concurrency 16
Using nemotron CLI#
uv run nemotron super3 rl swe2 \
--run YOUR-CLUSTER \
run.env.sif_dir=/path/to/sif
--run YOUR-CLUSTERrefers to a profile defined in yourenv.tomlfile. See the env.toml setup guide for details.SWE stages require the SWE container with pre-fetched venvs.
Using super_launch.sh#
EXP_NAME=stage2.2-swe2 \
CONFIG_PATH=examples/configs/super/stage2_swe2.yaml \
MODEL_PATH=/path/to/swe1_checkpoint \
TRAIN_PATH=$DATA_DIR/swe2/train-split.jsonl \
VAL_PATH=$DATA_DIR/swe2/val-split.jsonl \
CONTAINER=$SWE_CONTAINER \
SANDBOX_CONTAINER=$SANDBOX_CONTAINER \
PERSISTENT_CACHE=$PERSISTENT_CACHE \
EXTRA_MOUNTS=$EXTRA_MOUNTS \
SLURM_PARTITION=$SLURM_PARTITION \
SLURM_ACCOUNT=$SLURM_ACCOUNT \
SIF_DIR=/path/to/sif \
bash super_launch.sh
See the upstream training guide for full details on environment variables.
Recipe Source: src/nemotron/recipes/super3/stage2_rl/stage2_swe1/ and src/nemotron/recipes/super3/stage2_rl/stage2_swe2/