SWE-RL (Stages 2.1–2.2)#

End-to-end RL for software engineering tasks. SWE-RL is handled as a separate stage because its rollouts take substantially longer and require much longer context lengths, making it a throughput bottleneck when trained alongside the other RLVR environments.


SWE Container#

Both SWE stages require pre-fetched Python virtual environments that are not included in the base nemo-rl:v0.5.0.nemotron_3_super image. Build the SWE container once (from within the NeMo-RL repo):

docker buildx build \
  -t your-registry/nemo-rl:v0.5.0.nemotron_3_super_swe \
  --push \
  -f- . <<'EOF'
FROM nvcr.io/nvidia/nemo-rl:v0.5.0.nemotron_3_super

RUN <<'RUNEOF'
set -euxo pipefail
UV_TORCH_BACKEND=$(uv run python -c "import tomllib,pathlib; \
  indexes=tomllib.loads(pathlib.Path('pyproject.toml').read_text())['tool']['uv']['index']; \
  print(next(i['name'].removeprefix('pytorch-') for i in indexes if i['name'].startswith('pytorch-')))") \
UV_LINK_MODE=hardlink uv run python examples/nemo_gym/prefetch_venvs.py \
    examples/configs/super/stage2_swe1.yaml \
    examples/configs/super/stage2_swe2.yaml
RUNEOF
EOF

Set the container image in your config or via override:

uv run nemotron super3 rl swe1 --run YOUR-CLUSTER \
    run.env.container=your-registry/nemo-rl:v0.5.0.nemotron_3_super_swe

Stage 2.1 β€” SWE 1 (64 nodes)#

SWE-pivot training using a single-step tool use comparison approach. The model receives a code problem and must produce a solution evaluated against ground truth.

Configuration#

Parameter

Value

Nodes

64 (512 GPUs)

Generation nodes

32 (colocated=false)

Prompts/step

64

Generations/prompt

16

Batch size

1,024

Max sequence length

131,072

TP / CP

8 / 8

Learning rate

1e-6

KL penalty

0

Overlong filtering

true

Prefix caching

enabled

Config Files#

  • stage2_swe1/config/default.yaml β€” Full-scale 64-node config

  • stage2_swe1/config/small.yaml β€” Reduced 8-node variant for testing

Using nemotron CLI#

uv run nemotron super3 rl swe1 --run YOUR-CLUSTER

--run YOUR-CLUSTER refers to a profile defined in your env.toml file. See the env.toml setup guide for details.

SWE stages require the SWE container with pre-fetched venvs.

Using super_launch.sh#

EXP_NAME=stage2.1-swe1 \
CONFIG_PATH=examples/configs/super/stage2_swe1.yaml \
MODEL_PATH=/path/to/rlvr3_checkpoint \
TRAIN_PATH=$DATA_DIR/swe1/train-split.jsonl \
VAL_PATH=$DATA_DIR/swe1/val-split.jsonl \
CONTAINER=$SWE_CONTAINER \
SANDBOX_CONTAINER=$SANDBOX_CONTAINER \
PERSISTENT_CACHE=$PERSISTENT_CACHE \
EXTRA_MOUNTS=$EXTRA_MOUNTS \
SLURM_PARTITION=$SLURM_PARTITION \
SLURM_ACCOUNT=$SLURM_ACCOUNT \
bash super_launch.sh

Stage 2.2 β€” SWE 2 (64 nodes)#

Full SWE-bench training with container-isolated sandboxes. Each rollout launches an isolated container with the target repository, runs the OpenHands agent loop to produce a code patch, and runs ground-truth tests to compute a binary reward.

Configuration#

Parameter

Value

Nodes

64 (512 GPUs)

Generation nodes

32 (colocated=false)

Prompts/step

16

Generations/prompt

32

Batch size

512

Max sequence length

196,608

TP / CP

8 / 8

Learning rate

1e-6

KL penalty

0

Overlong filtering

true

Agent max turns

200

Agent concurrency

768

Agent timeout

3,600s

Thinking mode

enabled

Config Files#

  • stage2_swe2/config/default.yaml β€” Full-scale 64-node config

Infrastructure#

Container Isolation#

Each SWE task instance needs an isolated environment with its own filesystem to execute code and run tests. The upstream training uses Apptainer (formerly Singularity) because SLURM HPC clusters typically lack root access, which rules out Docker. Apptainer runs pre-built .sif images with a writable tmpfs overlay while sharing the host kernel.

Other environments: If you have root access or are running outside SLURM, Docker or Podman can provide the same container isolation with stronger process and memory boundaries (cgroup isolation). The SWE-bench environment images are available as standard Docker images from R2E-Gym, SWE-Gym, and SWE-Bench Verified on HuggingFace β€” they only need to be converted to .sif format for Apptainer.

Because Apptainer shares the host kernel and memory space, additional safeguards are needed:

Component

Description

Memory Watchdog

Monitors aggregate RSS of tmux process trees and proactively kills runaway processes, since Apptainer containers share host memory (unlike Docker’s cgroup isolation).

Command Blocklist

Regex-based blocklist intercepts dangerous commands (killall, pkill) that could terminate training processes or vLLM servers on the same node.

These safeguards are less critical when using Docker or Podman, which provide cgroup-based process and memory isolation by default.

Agent Loop#

Component

Description

OpenHands Agent Loop

Manages the full lifecycle: initializing runtime, presenting problems, running agent steps (up to 200 turns), extracting git patches, and running tests for binary reward.

Harness Diversity

OpenCode and Codex agent classes within OpenHands match external harness tool formats (Claude Code, Codex CLI) for training diversity.

Prerequisites#

For SLURM clusters without root access, install Apptainer and download SIF images:

# Install Apptainer (Ubuntu/Debian)
wget https://github.com/apptainer/apptainer/releases/download/v1.3.1/apptainer_1.3.1_amd64.deb
sudo apt install -y ./apptainer_1.3.1_amd64.deb

# Download and convert Docker images to .sif files
./examples/nemo_gym/download_swe_images.py --sif-dir /path/to/sif --concurrency 16

Using nemotron CLI#

uv run nemotron super3 rl swe2 \
    --run YOUR-CLUSTER \
    run.env.sif_dir=/path/to/sif

--run YOUR-CLUSTER refers to a profile defined in your env.toml file. See the env.toml setup guide for details.

SWE stages require the SWE container with pre-fetched venvs.

Using super_launch.sh#

EXP_NAME=stage2.2-swe2 \
CONFIG_PATH=examples/configs/super/stage2_swe2.yaml \
MODEL_PATH=/path/to/swe1_checkpoint \
TRAIN_PATH=$DATA_DIR/swe2/train-split.jsonl \
VAL_PATH=$DATA_DIR/swe2/val-split.jsonl \
CONTAINER=$SWE_CONTAINER \
SANDBOX_CONTAINER=$SANDBOX_CONTAINER \
PERSISTENT_CACHE=$PERSISTENT_CACHE \
EXTRA_MOUNTS=$EXTRA_MOUNTS \
SLURM_PARTITION=$SLURM_PARTITION \
SLURM_ACCOUNT=$SLURM_ACCOUNT \
SIF_DIR=/path/to/sif \
bash super_launch.sh

See the upstream training guide for full details on environment variables.


Recipe Source: src/nemotron/recipes/super3/stage2_rl/stage2_swe1/ and src/nemotron/recipes/super3/stage2_rl/stage2_swe2/