Sandbox API

View as Markdown

The nemo_gym.sandbox module is the provider-neutral interface for creating isolated execution environments, running commands, and moving files in or out of those environments. Agents and resources servers call the same API while provider pages document backend-specific setup, configuration, and isolation properties.

Import caller-facing APIs from the public package boundary:

1from nemo_gym.sandbox import AsyncSandbox, Sandbox, SandboxResources, SandboxSpec

Treat nemo_gym.sandbox as the stable caller-facing API. Provider modules under nemo_gym.sandbox.providers are implementation details unless you are adding or configuring a provider.

Install Provider Dependencies

The public API is part of nemo-gym. Runtime backends can have optional dependencies, so install the extras or system packages required by the provider you configure in your agent or resources server.

Core Types

APIPurpose
AsyncSandboxAsync facade for FastAPI servers, async agents, and rollout code. Use this inside async code.
SandboxSync facade for synchronous harnesses. It owns a private event loop and rejects calls from an already-running async loop.
SandboxSpecProvider-neutral sandbox creation request. Includes image, TTL, working directory, files, metadata, resources, entrypoint, and provider options.
SandboxResourcesTyped resource request with CPU, memory, disk, and GPU fields.
SandboxExecResultCommand result with stdout, stderr, return_code, and optional error_type.
SandboxStatusProvider-neutral lifecycle status: starting, running, stopped, error, or unknown.
SandboxCreateErrorBase create-time failure for provider allocation and readiness errors.
SandboxCreateVerificationErrorCreate-time failure raised when a new sandbox cannot pass provider readiness checks.

First-Run Example

Create a small local script after your provider is available. Replace the provider name and settings with the backend configured for your environment.

1import asyncio
2
3from nemo_gym.sandbox import AsyncSandbox, SandboxResources, SandboxSpec
4
5
6provider_config = {"apptainer": {}}
7
8spec = SandboxSpec(
9 image="docker://python:3.12-slim",
10 ttl_s=1800,
11 ready_timeout_s=300,
12 workdir="/sandbox",
13 files={
14 "/sandbox/hello.py": "print('hello from sandbox')\n",
15 },
16 resources=SandboxResources(cpu=1, memory_mib=1024, disk_gib=5),
17 metadata={"example": "first-run"},
18)
19
20
21async def main() -> None:
22 async with AsyncSandbox(provider_config, spec) as sandbox:
23 await sandbox.start()
24 result = await sandbox.exec("python /sandbox/hello.py", timeout_s=60)
25 print(result.stdout or result.stderr)
26 raise SystemExit(result.return_code)
27
28
29asyncio.run(main())

Run it from your local checkout or application environment:

$python first_sandbox.py

exec() returns a SandboxExecResult. Nonzero process exits are reported in return_code; providers should reserve exceptions for sandbox runtime failures such as allocation, transport, or lifecycle errors.

Provider Config Blocks

Agents can refer to a named provider block instead of embedding provider credentials or backend settings in the agent config. The sandbox_provider field names a top-level config block, and that block contains exactly one provider key plus optional reserved keys:

1sandbox:
2 default_metadata:
3 sandbox-api: opensandbox-sdk
4 opensandbox:
5 connection:
6 domain: ${oc.env:OPENSANDBOX_DOMAIN}
7 api_key: ${oc.env:OPENSANDBOX_API_KEY}
8 protocol: http

The agent then references that block by name:

1mini_swe_agent_2:
2 responses_api_agents:
3 mini_swe_agent_2:
4 sandbox_provider: sandbox
5 sandbox_spec:
6 resources:
7 cpu: 2
8 memory_mib: 8192
9 provider_options:
10 platform:
11 os: linux
12 arch: amd64

Every provider config can bind the same name, such as sandbox, so swapping backends is swapping the provider config path passed to gym env start. If one run needs multiple sandboxes, give each block a distinct name and reference each one separately.

default_metadata is merged into SandboxSpec.metadata before create. The agent’s own sandbox_spec.metadata wins on key conflicts.

The helpers used by agents are public:

1from nemo_gym.sandbox import resolve_provider_config, resolve_provider_metadata
2
3
4provider_config = resolve_provider_config("sandbox", global_config_dict)
5provider_metadata = resolve_provider_metadata("sandbox", global_config_dict)

Inline single-key mappings are also accepted when a caller does not use Hydra config:

1provider_config = {"opensandbox": {"connection": {"domain": "sandbox.example"}}}

Lifecycle

AsyncSandbox and Sandbox are lifecycle objects. Construct one with a provider config and optional SandboxSpec, call start(), run commands or transfer files, then call stop(). Context managers close the provider on exit, but they do not start the sandbox automatically.

1from nemo_gym.sandbox import AsyncSandbox, SandboxResources, SandboxSpec
2
3
4spec = SandboxSpec(
5 image="ghcr.io/example/eval-image:py312",
6 ttl_s=18000,
7 ready_timeout_s=1200,
8 workdir="/workspace",
9 resources=SandboxResources(cpu=2, memory_mib=8192, disk_gib=20),
10 metadata={"benchmark": "my-benchmark", "task_id": "task-001"},
11)
12
13async with AsyncSandbox(provider_config, spec) as sandbox:
14 await sandbox.start()
15 result = await sandbox.exec(
16 "python -m pytest -q",
17 timeout_s=600,
18 user="root",
19 )
20 passed = result.return_code == 0

Sync vs. Async

Use AsyncSandbox inside FastAPI handlers, async resources servers, async agents, and rollout collection code.

Use Sandbox only in synchronous code, such as a third-party harness adapter that does not expose async hooks.

1from nemo_gym.sandbox import Sandbox, SandboxSpec
2
3
4with Sandbox(provider_config, SandboxSpec(image="ghcr.io/example/eval-image:py312")) as sandbox:
5 sandbox.start()
6 result = sandbox.exec("python --version", timeout_s=30)
7 output = "\n".join(part for part in (result.stdout, result.stderr) if part)

Do not call Sandbox from FastAPI handlers, async resources servers, or async agents. It blocks the caller by design. Use AsyncSandbox in async code.

SandboxSpec Fields

SandboxSpec is intentionally provider-neutral. Providers map these fields onto their own runtime primitives.

FieldDescription
imageContainer image to create.
ttl_sSandbox lifetime in seconds, when supported by the provider.
ready_timeout_sTime to wait for sandbox readiness.
workdirDefault working directory for exec() calls.
envEnvironment variables injected into the sandbox. Forward only values required by the task.
filesText files to upload at startup, keyed by remote target path.
metadataString metadata for tracing, debugging, and backend labels. Providers may normalize values for their runtime.
resourcesSandboxResources or a mapping with cpu, memory_mib, disk_gib, gpu, and gpu_type.
entrypointOptional container entrypoint override.
provider_optionsProvider-specific options that do not fit the common schema.

You can pass resources as either a SandboxResources instance or a mapping:

1spec = SandboxSpec(
2 image="ghcr.io/example/eval-image:py312",
3 resources={
4 "cpu": 2,
5 "memory_mib": 8192,
6 "disk_gib": 20,
7 },
8)

Unknown resource keys raise a ValueError, which catches config drift early.

Startup Files and File Transfer

Use files for small text files that should exist before the first command runs:

1spec = SandboxSpec(
2 image="ghcr.io/example/eval-image:py312",
3 workdir="/workspace",
4 files={
5 "/workspace/input.txt": "hello\n",
6 },
7)

Use upload() and download() for local files:

1await sandbox.upload(local_path, "/workspace/archive.tar.gz")
2await sandbox.download("/workspace/log.txt", output_path)

upload() and download() operate on files. If you need structured values, serialize them locally before uploading and parse the downloaded file locally after the sandbox command completes.

Status and Cleanup

Call status() when a runner needs to distinguish a stopped sandbox from a provider error:

1status = await sandbox.status()
2if status.value == "error":
3 ...

Always stop sandboxes in cleanup paths. stop() is idempotent on the public facade and closes provider-scoped resources after ending the sandbox lifecycle.

1sandbox = AsyncSandbox(provider_config, spec)
2try:
3 await sandbox.start()
4 result = await sandbox.exec("pytest -q", timeout_s=600)
5finally:
6 await sandbox.stop()

Image Rewrites

Use rewrite_image() when a benchmark’s upstream image needs to run through an internal registry mirror.

1from nemo_gym.sandbox import rewrite_image
2
3
4image = rewrite_image(
5 "docker.io/library/python:3.12-slim",
6 [{"from": "docker.io/", "to": "mirror.example.com/dockerhub/"}],
7)

Rewrites are ordered. The first matching from prefix wins.

Error Handling

Sandbox create failures use provider-neutral exception classes:

  • SandboxCreateError for sandbox allocation or readiness failures.
  • SandboxCreateVerificationError when a created sandbox fails Gym readiness verification.

In resources servers and agents, catch these errors close to the sandbox operation and return a meaningful verifier or rollout error. Do not let one bad sandbox allocation crash a long-running server.

1from nemo_gym.sandbox import AsyncSandbox, SandboxCreateError
2
3
4sandbox = AsyncSandbox(provider_config, spec)
5try:
6 await sandbox.start()
7except SandboxCreateError as error:
8 return {"reward": 0.0, "error": f"sandbox_create_failed: {error}"}