nemo_gym.sandbox.providers.apptainer.provider

View as Markdown

Apptainer provider implementation.

Module Contents

Classes

NameDescription
ApptainerCreateConfigSettings for creating an Apptainer sandbox instance.
ApptainerCreateErrorRaised when Apptainer cannot create a sandbox.
ApptainerCreateVerificationErrorRaised when a newly-created sandbox cannot execute a probe command.
ApptainerExecConfigSettings for running commands inside an Apptainer sandbox.
ApptainerProbeConfigPost-create probe settings: a test command confirming the sandbox is usable.
ApptainerProviderSandbox provider backed by the local Apptainer CLI.
_ApptainerInstanceProvider-private state stashed on SandboxHandle.raw.

Functions

NameDescription
_coerce_bindsNormalize spec.provider_options['binds'] into a list of bind strings.
_coerce_configAccept either a config dataclass instance or a plain mapping (Hydra YAML).
_is_missing_instance-
_is_runtime_failureBest-effort: did apptainer itself fail to run the command (vs the command failing)?
_path_under_mountIf path is inside the mount, return its path relative to the mount; else None.
_require_apptainerReturn the apptainer binary path or hard-error if it is not installed.
_resolve_image-
_resource_flagsTranslate neutral resources into apptainer CLI flags.
_resource_limit_flags-
_resource_passthrough_flags-
_to_sandbox_statusMap an apptainer-reported state string to the neutral status enum.

Data

APPTAINER_MISSING_INSTANCE_MARKERS

APPTAINER_RUNTIME_ERROR_MARKERS

DEFAULT_MOUNT_POINT

INSTANCE_NAME_PREFIX

LOGGER

READY_PROBE_COMMAND

READY_PROBE_EXPECTED

SANDBOX_RUNTIME_RETURN_CODE

API

class nemo_gym.sandbox.providers.apptainer.provider.ApptainerCreateConfig(
mount_point: str = DEFAULT_MOUNT_POINT,
start_timeout_s: float | None = 600,
extra_start_args: list[str] = list(),
apply_resource_limits: bool = True
)
Dataclass

Settings for creating an Apptainer sandbox instance.

apply_resource_limits
bool = True
extra_start_args
list[str] = field(default_factory=list)
mount_point
str = DEFAULT_MOUNT_POINT
start_timeout_s
float | None = 600
nemo_gym.sandbox.providers.apptainer.provider.ApptainerCreateConfig.__post_init__() -> None
class nemo_gym.sandbox.providers.apptainer.provider.ApptainerCreateError()

Bases: SandboxCreateError

Raised when Apptainer cannot create a sandbox.

class nemo_gym.sandbox.providers.apptainer.provider.ApptainerCreateVerificationError()

Bases: SandboxCreateVerificationError

Raised when a newly-created sandbox cannot execute a probe command.

class nemo_gym.sandbox.providers.apptainer.provider.ApptainerExecConfig(
default_timeout_s: float | None = 180,
fakeroot_for_root: bool = True,
default_binds: list[str] = list(),
extra_exec_args: list[str] = list(),
concurrency: int = 32
)
Dataclass

Settings for running commands inside an Apptainer sandbox.

concurrency
int = 32
default_binds
list[str] = field(default_factory=list)
default_timeout_s
float | None = 180
extra_exec_args
list[str] = field(default_factory=list)
fakeroot_for_root
bool = True
nemo_gym.sandbox.providers.apptainer.provider.ApptainerExecConfig.__post_init__() -> None
class nemo_gym.sandbox.providers.apptainer.provider.ApptainerProbeConfig(
command: str | None = READY_PROBE_COMMAND,
expected_stdout: str | None = READY_PROBE_EXPECTED,
timeout_s: int = 30,
deadline_s: float | None = None,
stable_count: int = 1,
stable_delay_s: float = 0.0
)
Dataclass

Post-create probe settings: a test command confirming the sandbox is usable.

command
str | None = READY_PROBE_COMMAND
deadline_s
float | None = None
expected_stdout
str | None = READY_PROBE_EXPECTED
stable_count
int = 1
stable_delay_s
float = 0.0
timeout_s
int = 30
nemo_gym.sandbox.providers.apptainer.provider.ApptainerProbeConfig.__post_init__() -> None
class nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider(
exec: nemo_gym.sandbox.providers.apptainer.provider.ApptainerExecConfig | collections.abc.Mapping[str, typing.Any] | None = None,
create: nemo_gym.sandbox.providers.apptainer.provider.ApptainerCreateConfig | collections.abc.Mapping[str, typing.Any] | None = None,
probe: nemo_gym.sandbox.providers.apptainer.provider.ApptainerProbeConfig | collections.abc.Mapping[str, typing.Any] | None = None
)

Sandbox provider backed by the local Apptainer CLI.

_binary
= _require_apptainer()
_create_config
= _coerce_config(create, ApptainerCreateConfig)
_exec_config
= _coerce_config(exec, ApptainerExecConfig)
_probe
= _coerce_config(probe, ApptainerProbeConfig)
_semaphore
= asyncio.Semaphore(self._exec_config.concurrency)
name
= 'apptainer'
nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider._cleanup_failed_create_handle(
handle: nemo_gym.sandbox.providers.base.SandboxHandle
) -> None
async

Best-effort teardown of a sandbox that failed verification.

nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider._run(
argv: list[str],
timeout_s: float | None,
stdin: bytes | None = None,
daemonize: bool = False
) -> tuple[int, str, str]
async

Run an apptainer CLI command. Returns (return_code, stdout, stderr).

Enforces timeout via asyncio.wait_for and kills the whole process group on timeout so child processes do not linger. Bounds concurrency with a shared semaphore. Decodes output with errors=“replace”.

Set daemonize=True for commands that fork a long-lived background process (apptainer instance start). Such commands hand the started instance a copy of the child’s stdout/stderr, so reading those pipes to EOF (communicate()) blocks until the instance exits — i.e. the call appears to hang until timeout_s even though the foreground process finished in under a second. In that mode we capture output to temp files (which the instance may inherit harmlessly) and only wait for the foreground process to exit.

nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider._run_daemonizing(
argv: list[str],
timeout_s: float | None
) -> tuple[int, str, str]
async

Run a command that daemonizes a child (e.g. apptainer instance start).

Captures stdout/stderr to temp files instead of pipes so the long-lived instance inheriting those descriptors cannot wedge the read, then waits only for the foreground process to exit.

nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider._verify_created_handle(
handle: nemo_gym.sandbox.providers.base.SandboxHandle
) -> None
async

Run the readiness probe until the sandbox responds, or raise.

  • probe.command is None -> skip (no verification).
  • probe.deadline_s is None -> single attempt; a failure raises immediately.
  • probe.deadline_s is set -> poll until the sandbox passes the probe stable_count consecutive times, or the deadline elapses.
nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider.aclose() -> None
async

No provider-wide resources to close.

nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider.close(
handle: nemo_gym.sandbox.providers.base.SandboxHandle
) -> None
async

Stop the instance and clean up the host staging dir. Runs apptainer instance stop <name> If there is no instance —> SUCCESS Removes the host staging dir afterward

nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider.create(
spec: nemo_gym.sandbox.providers.base.SandboxSpec
) -> nemo_gym.sandbox.providers.base.SandboxHandle
async

Start an apptainer instance and return a ready handle.

Steps:

  1. Warn once if spec.ttl_s is set (unsupported by apptainer).
  2. Resolve the image source (local .sif path or remote docker://, oras://, library:// URI) from spec.image. — just use as is
  3. Make a host staging dir (tempfile.mkdtemp), pick mount_point = self._create_config.mount_point, generate a unique name = INSTANCE_NAME_PREFIX + uuid4().hex.
  4. Build argv: [binary, “instance”, “start”, <—bind staging:mount_point>, <config default_binds>, <spec.provider_options[“binds”]>, <—env …>, _resource_flags(spec.resources), <extra_start_args>, image, name].
  5. await self._run(argv, timeout_s=self._create_config.start_timeout_s); on non-zero return, clean up the staging dir and raise ApptainerCreateError(stderr).
  6. Build the handle: SandboxHandle(sandbox_id=name, provider_name=self.name, raw=_ApptainerInstance(name, staging_dir, mount_point, image)).
  7. Verify readiness via self._verify_created_handle(handle); on failure clean up and raise ApptainerCreateVerificationError.
  8. Return the handle.
nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider.download_file(
handle: nemo_gym.sandbox.providers.base.SandboxHandle,
source_path: str,
target_path: pathlib.Path
) -> None
async

Download one sandbox file to the host.

Fast path (source under the bind mount): read directly from the host side of the shared folder. Fallback (arbitrary path): cp inside the container into the shared folder, then read the host side.

nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider.exec(
handle: nemo_gym.sandbox.providers.base.SandboxHandle,
command: str,
cwd: str | None = None,
env: dict[str, str] | None = None,
timeout_s: int | float | None = None,
user: str | int | None = None,
stdin: bytes | None = None
) -> nemo_gym.sandbox.providers.base.SandboxExecResult
async

Run a command inside the instance.

Maps the neutral user parameter onto apptainer:

  • None -> run as the default (launching) user.
  • “root” / 0 -> add —fakeroot (root inside the container).
  • other user/uid -> —fakeroot + wrap in su to switch to that user.

stdin, when given, is piped to the command’s standard input. This is an apptainer-provider extension to the base protocol, useful for feeding large inputs (e.g. prompts) that would exceed the kernel’s argv length limit.

nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider.status(
handle: nemo_gym.sandbox.providers.base.SandboxHandle
) -> nemo_gym.sandbox.providers.base.SandboxStatus
async

Return the instance’s lifecycle status by querying apptainer instance list. Runs apptainer instance list —json On timeout, non-zero, unparseable JSON ---> UNKNOWN Look for the instance name of this sandbox. If it is found —> RUNNING. If it’s gone —> STOPPED

nemo_gym.sandbox.providers.apptainer.provider.ApptainerProvider.upload_file(
handle: nemo_gym.sandbox.providers.base.SandboxHandle,
source_path: pathlib.Path,
target_path: str
) -> None
async

Upload one host file into the sandbox.

Fast path (target under the bind mount): write directly to the host side of the shared folder. Fallback (arbitrary path): stage into the shared folder, then cp inside the container.

class nemo_gym.sandbox.providers.apptainer.provider._ApptainerInstance(
name: str,
staging_dir: pathlib.Path,
mount_point: str,
image: str,
env: dict[str, str] = dict()
)
Dataclass

Provider-private state stashed on SandboxHandle.raw.

env
dict[str, str] = field(default_factory=dict)
image
str
mount_point
str
name
str
staging_dir
Path
nemo_gym.sandbox.providers.apptainer.provider._coerce_binds(
value: typing.Any
) -> list[str]

Normalize spec.provider_options['binds'] into a list of bind strings.

Accepts a single "src:dst[:opts]" string or a list of them. These are extra per-sandbox bind mounts, added on top of the staging mount and the provider-level exec.default_binds.

nemo_gym.sandbox.providers.apptainer.provider._coerce_config(
value: typing.Any,
config_cls: type[typing.Any]
) -> typing.Any

Accept either a config dataclass instance or a plain mapping (Hydra YAML).

nemo_gym.sandbox.providers.apptainer.provider._is_missing_instance(
stderr: str
) -> bool
nemo_gym.sandbox.providers.apptainer.provider._is_runtime_failure(
stderr: str
) -> bool

Best-effort: did apptainer itself fail to run the command (vs the command failing)?

nemo_gym.sandbox.providers.apptainer.provider._path_under_mount(
mount_point: str,
path: str
) -> str | None

If path is inside the mount, return its path relative to the mount; else None.

nemo_gym.sandbox.providers.apptainer.provider._require_apptainer() -> str

Return the apptainer binary path or hard-error if it is not installed.

nemo_gym.sandbox.providers.apptainer.provider._resolve_image(
image: str
) -> str
nemo_gym.sandbox.providers.apptainer.provider._resource_flags(
resources: nemo_gym.sandbox.providers.base.SandboxResources
) -> list[str]

Translate neutral resources into apptainer CLI flags.

nemo_gym.sandbox.providers.apptainer.provider._resource_limit_flags(
resources: nemo_gym.sandbox.providers.base.SandboxResources
) -> list[str]
nemo_gym.sandbox.providers.apptainer.provider._resource_passthrough_flags(
resources: nemo_gym.sandbox.providers.base.SandboxResources
) -> list[str]
nemo_gym.sandbox.providers.apptainer.provider._to_sandbox_status(
state: str | None
) -> nemo_gym.sandbox.providers.base.SandboxStatus

Map an apptainer-reported state string to the neutral status enum.

nemo_gym.sandbox.providers.apptainer.provider.APPTAINER_MISSING_INSTANCE_MARKERS = ('no instance found', 'instance not found', 'does not exist')
nemo_gym.sandbox.providers.apptainer.provider.APPTAINER_RUNTIME_ERROR_MARKERS = ('fatal:', 'no instance found', 'instance not found', 'does not exist')
nemo_gym.sandbox.providers.apptainer.provider.DEFAULT_MOUNT_POINT = '/sandbox'
nemo_gym.sandbox.providers.apptainer.provider.INSTANCE_NAME_PREFIX = 'nemo-gym-'
nemo_gym.sandbox.providers.apptainer.provider.LOGGER = logging.getLogger(__name__)
nemo_gym.sandbox.providers.apptainer.provider.READY_PROBE_COMMAND = 'printf apptainer-sandbox-ready'
nemo_gym.sandbox.providers.apptainer.provider.READY_PROBE_EXPECTED = 'apptainer-sandbox-ready'
nemo_gym.sandbox.providers.apptainer.provider.SANDBOX_RUNTIME_RETURN_CODE = 125