nemo_gym.sandbox.providers.apptainer.provider
nemo_gym.sandbox.providers.apptainer.provider
Apptainer provider implementation.
Module Contents
Classes
Functions
Data
APPTAINER_MISSING_INSTANCE_MARKERS
APPTAINER_RUNTIME_ERROR_MARKERS
API
Settings for creating an Apptainer sandbox instance.
Bases: SandboxCreateError
Raised when Apptainer cannot create a sandbox.
Bases: SandboxCreateVerificationError
Raised when a newly-created sandbox cannot execute a probe command.
Settings for running commands inside an Apptainer sandbox.
Post-create probe settings: a test command confirming the sandbox is usable.
Sandbox provider backed by the local Apptainer CLI.
Best-effort teardown of a sandbox that failed verification.
Run an apptainer CLI command. Returns (return_code, stdout, stderr).
Enforces timeout via asyncio.wait_for and kills the whole process group on timeout so child processes do not linger. Bounds concurrency with a shared semaphore. Decodes output with errors=“replace”.
Set daemonize=True for commands that fork a long-lived background
process (apptainer instance start). Such commands hand the started
instance a copy of the child’s stdout/stderr, so reading those pipes to
EOF (communicate()) blocks until the instance exits — i.e. the call
appears to hang until timeout_s even though the foreground process
finished in under a second. In that mode we capture output to temp files
(which the instance may inherit harmlessly) and only wait for the
foreground process to exit.
Run a command that daemonizes a child (e.g. apptainer instance start).
Captures stdout/stderr to temp files instead of pipes so the long-lived instance inheriting those descriptors cannot wedge the read, then waits only for the foreground process to exit.
Run the readiness probe until the sandbox responds, or raise.
- probe.command is None -> skip (no verification).
- probe.deadline_s is None -> single attempt; a failure raises immediately.
- probe.deadline_s is set -> poll until the sandbox passes the probe
stable_countconsecutive times, or the deadline elapses.
No provider-wide resources to close.
Stop the instance and clean up the host staging dir. Runs apptainer instance stop <name> If there is no instance —> SUCCESS Removes the host staging dir afterward
Start an apptainer instance and return a ready handle.
Steps:
- Warn once if spec.ttl_s is set (unsupported by apptainer).
- Resolve the image source (local .sif path or remote docker://, oras://, library:// URI) from spec.image. — just use as is
- Make a host staging dir (tempfile.mkdtemp), pick mount_point = self._create_config.mount_point, generate a unique name = INSTANCE_NAME_PREFIX + uuid4().hex.
- Build argv: [binary, “instance”, “start”, <—bind staging:mount_point>, <config default_binds>, <spec.provider_options[“binds”]>, <—env …>, _resource_flags(spec.resources), <extra_start_args>, image, name].
- await self._run(argv, timeout_s=self._create_config.start_timeout_s); on non-zero return, clean up the staging dir and raise ApptainerCreateError(stderr).
- Build the handle: SandboxHandle(sandbox_id=name, provider_name=self.name, raw=_ApptainerInstance(name, staging_dir, mount_point, image)).
- Verify readiness via self._verify_created_handle(handle); on failure clean up and raise ApptainerCreateVerificationError.
- Return the handle.
Download one sandbox file to the host.
Fast path (source under the bind mount): read directly from the host side of the shared folder. Fallback (arbitrary path): cp inside the container into the shared folder, then read the host side.
Run a command inside the instance.
Maps the neutral user parameter onto apptainer:
- None -> run as the default (launching) user.
- “root” / 0 -> add —fakeroot (root inside the container).
- other user/uid -> —fakeroot + wrap in
suto switch to that user.
stdin, when given, is piped to the command’s standard input. This is an
apptainer-provider extension to the base protocol, useful for feeding large
inputs (e.g. prompts) that would exceed the kernel’s argv length limit.
Return the instance’s lifecycle status by querying apptainer instance list.
Runs apptainer instance list —json
On timeout, non-zero, unparseable JSON ---> UNKNOWN
Look for the instance name of this sandbox. If it is found —> RUNNING. If it’s gone —> STOPPED
Upload one host file into the sandbox.
Fast path (target under the bind mount): write directly to the host side of the shared folder. Fallback (arbitrary path): stage into the shared folder, then cp inside the container.
Provider-private state stashed on SandboxHandle.raw.
Normalize spec.provider_options['binds'] into a list of bind strings.
Accepts a single "src:dst[:opts]" string or a list of them. These are
extra per-sandbox bind mounts, added on top of the staging mount and the
provider-level exec.default_binds.
Accept either a config dataclass instance or a plain mapping (Hydra YAML).
Best-effort: did apptainer itself fail to run the command (vs the command failing)?
If path is inside the mount, return its path relative to the mount; else None.
Return the apptainer binary path or hard-error if it is not installed.
Translate neutral resources into apptainer CLI flags.
Map an apptainer-reported state string to the neutral status enum.