Agent Skills
Skills are reusable units of operational knowledge an agent can load at runtime, following the open Agent Skills standard used by Claude Code and Codex CLI. A skill is a directory containing a SKILL.md file (YAML frontmatter + markdown body) plus optional supporting files.
NeMo Gym treats skills as an environment-level variable you can sweep, the same way it treats prompts. You point a run at a directory of skills, the agent loads them with its own native mechanism, and each rollout result is tagged so you can compare variants. NeMo Gym does not interpret or activate skills — it sets up the skill environment and measures the outcome.
Skills are a run-level knob, not a dataset field
Skills are specified on gym eval run and applied to a fixed, skill-agnostic dataset at rollout time. The dataset stays a description of tasks, so the same dataset is reusable across skill variants and across agents.
Usage
A single skills.path field points at a directory of skill directories (here --no-serve collects against servers already started with gym env start):
The directory follows the standard layout — one subdirectory per skill, each with a SKILL.md:
To compare variants, run again with a different skills.path over the same dataset. The skills path is resolved like input_jsonl_fpath (relative paths check the working directory, then the Gym root). For distributed runs the directory must be on storage accessible to the agent process.
Output: skills_ref
Each rollout result is stamped with a skills_ref for provenance and grouping during reward profiling:
hash is a short content digest of the skill directory. It exists so that optimizer loops (e.g. ACE, GEPA, EvoSkill) that mutate a skill in place at the same path still produce distinguishable variants — identity is derived from bytes on disk, requiring no cooperation from the optimizer. Identical content yields an identical hash, so re-testing a prior variant automatically pools with its earlier evaluation.
For concurrent candidate evaluation (population search), write each candidate to its own directory (skills/cand-0/, skills/cand-1/, …) to avoid a path-reuse read/write race.
How skills reach the agent
NeMo Gym handles everything agent-agnostic: it resolves skills.path, computes the skills_ref, stamps it onto each request, and propagates it to results. The agent’s job is only to make the resolved directory discoverable by its native runtime — because where a runtime discovers skills is intrinsically agent-specific.
This splits cleanly into shared core and a thin per-agent adapter:
Adding skills support to an agent
To support skills in a new agent, implement the three-step adapter. NeMo Gym provides the shared utilities in nemo_gym.skills.
-
Read the skills path from the request. NeMo Gym stamps
skills_ref(withpath) onto the/runrequest body. Readskills_ref["path"]in your agent’srun(). -
Stage the directory where your runtime discovers skills, using
nemo_gym.skills.stage_skills(path, dest_dir). The destination is agent-specific:Prefer staging into a per-request ephemeral location so concurrent requests with different skills do not contaminate one another, and so nothing leaks between rollouts.
-
Enable native discovery. If your runtime disables discovery by default (Claude Code runs
--bare), turn it on when skills are present. Then the agent’s native loader handles selection and activation — NeMo Gym adds no skill-interpretation logic.Note that enabling discovery is usually all-or-nothing: with Claude Code, dropping
--bareso skills load also re-enables all other auto-discovery — hooks, plugins, MCP servers, memory, andCLAUDE.md— not just skills. If you are measuring a skill’s isolated impact, be aware that a baseline (--bare, no skills) versus a skills run may differ by more than the skill content alone.
The reference implementation is the Claude Code agent. It stages skills into a fresh per-request CLAUDE_CONFIG_DIR/skills/ and drops --bare when skills are active.
What NeMo Gym does not do
- It does not parse, select, inject, or activate skills — the agent’s native runtime does all of that.
- It does not add skill-loading capabilities to agents that lack them. Injecting skill content into the prompt for such agents is a possible future enhancement, not part of this design.