Agent Skills | NeMo Gym

Skills are reusable units of operational knowledge an agent can load at runtime, following the open Agent Skills standard used by Claude Code and Codex CLI. A skill is a directory containing a SKILL.md file (YAML frontmatter + markdown body) plus optional supporting files.

NeMo Gym treats skills as an environment-level variable you can sweep, the same way it treats prompts. You point a run at a directory of skills, the agent loads them with its own native mechanism, and each rollout result is tagged so you can compare variants. NeMo Gym does not interpret or activate skills — it sets up the skill environment and measures the outcome.

Skills are a run-level knob, not a dataset field

Skills are specified on gym eval run and applied to a fixed, skill-agnostic dataset at rollout time. The dataset stays a description of tasks, so the same dataset is reusable across skill variants and across agents.

Usage

A single skills.path field points at a directory of skill directories (here --no-serve collects against servers already started with gym env start):

$ gym eval run --no-serve +agent_name=my_agent \
>   +input_jsonl_fpath=data/tasks.jsonl \
>   +output_jsonl_fpath=results/rollouts_variant_a.jsonl \
>   +skills.path=skills/variant_a/

The directory follows the standard layout — one subdirectory per skill, each with a SKILL.md:

skills/variant_a/
├── cot_enhanced/
│   └── SKILL.md
├── tool_focused/
│   ├── SKILL.md
│   └── references/
│       └── api_spec.md
└── baseline/
    └── SKILL.md

To compare variants, run again with a different skills.path over the same dataset. The skills path is resolved like input_jsonl_fpath (relative paths check the working directory, then the Gym root). For distributed runs the directory must be on storage accessible to the agent process.

Output: `skills_ref`

Each rollout result is stamped with a skills_ref for provenance and grouping during reward profiling:

1 {
2   "reward": 1.0,
3   "skills_ref": {
4     "path": "skills/variant_a/",
5     "hash": "a1b2c3…",
6     "skills": [{"name": "cot_enhanced", "description": "..."}]
7   }
8 }

hash is a short content digest of the skill directory. It exists so that optimizer loops (e.g. ACE, GEPA, EvoSkill) that mutate a skill in place at the same path still produce distinguishable variants — identity is derived from bytes on disk, requiring no cooperation from the optimizer. Identical content yields an identical hash, so re-testing a prior variant automatically pools with its earlier evaluation.

For concurrent candidate evaluation (population search), write each candidate to its own directory (skills/cand-0/, skills/cand-1/, …) to avoid a path-reuse read/write race.

How skills reach the agent

NeMo Gym handles everything agent-agnostic: it resolves skills.path, computes the skills_ref, stamps it onto each request, and propagates it to results. The agent’s job is only to make the resolved directory discoverable by its native runtime — because where a runtime discovers skills is intrinsically agent-specific.

This splits cleanly into shared core and a thin per-agent adapter:

Responsibility	Owner
`skills.path` config, load/validate, content hash, `skills_ref` stamping + propagation	NeMo Gym core (shared)
Read `skills_ref.path` from the request	Per-agent (identical shape)
Stage the directory into the runtime’s discovery location	Per-agent (location differs)
Enable native discovery; load/activate skills	The agent’s native runtime

Adding skills support to an agent

To support skills in a new agent, implement the three-step adapter. NeMo Gym provides the shared utilities in nemo_gym.skills.

Read the skills path from the request. NeMo Gym stamps skills_ref (with path) onto the /run request body. Read skills_ref["path"] in your agent’s run().
Stage the directory where your runtime discovers skills, using nemo_gym.skills.stage_skills(path, dest_dir). The destination is agent-specific:

Agent Discovery location
Claude Code CLAUDE_CONFIG_DIR/skills/
Codex CLI the Codex config home (e.g. $CODEX_HOME)
MCP-backed agent configure the MCP skill server to serve from the directory

Prefer staging into a per-request ephemeral location so concurrent requests with different skills do not contaminate one another, and so nothing leaks between rollouts.
Enable native discovery. If your runtime disables discovery by default (Claude Code runs --bare), turn it on when skills are present. Then the agent’s native loader handles selection and activation — NeMo Gym adds no skill-interpretation logic.

Note that enabling discovery is usually all-or-nothing: with Claude Code, dropping --bare so skills load also re-enables all other auto-discovery — hooks, plugins, MCP servers, memory, and CLAUDE.md — not just skills. If you are measuring a skill’s isolated impact, be aware that a baseline (--bare, no skills) versus a skills run may differ by more than the skill content alone.

Agent	Discovery location
Claude Code	`CLAUDE_CONFIG_DIR/skills/`
Codex CLI	the Codex config home (e.g. `$CODEX_HOME`)
MCP-backed agent	configure the MCP skill server to serve from the directory

The reference implementation is the Claude Code agent. It stages skills into a fresh per-request CLAUDE_CONFIG_DIR/skills/ and drops --bare when skills are active.

What NeMo Gym does not do

It does not parse, select, inject, or activate skills — the agent’s native runtime does all of that.
It does not add skill-loading capabilities to agents that lack them. Injecting skill content into the prompt for such agents is a possible future enhancement, not part of this design.