> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/gym/_mcp/server.

# Agent Skills

> Evaluate agent skills as a run-level variable, decoupled from the dataset

Skills are reusable units of operational knowledge an agent can load at runtime, following the open [Agent Skills standard](https://agentskills.io/specification) used by Claude Code and Codex CLI. A skill is a **directory** containing a `SKILL.md` file (YAML frontmatter + markdown body) plus optional supporting files.

NeMo Gym treats skills as an **environment-level variable you can sweep**, the same way it treats prompts. You point a run at a directory of skills, the agent loads them with its own native mechanism, and each rollout result is tagged so you can compare variants. NeMo Gym does not interpret or activate skills — it sets up the skill environment and measures the outcome.

## Skills are a run-level knob, not a dataset field

Skills are specified on `gym eval run` and applied to a **fixed, skill-agnostic dataset** at rollout time. The dataset stays a description of tasks, so the same dataset is reusable across skill variants and across agents.

## Usage

A single `skills.path` field points at a directory of skill directories (here `--no-serve` collects against servers already started with `gym env start`):

```bash
gym eval run --no-serve +agent_name=my_agent \
  +input_jsonl_fpath=data/tasks.jsonl \
  +output_jsonl_fpath=results/rollouts_variant_a.jsonl \
  +skills.path=skills/variant_a/
```

The directory follows the standard layout — one subdirectory per skill, each with a `SKILL.md`:

```
skills/variant_a/
├── cot_enhanced/
│   └── SKILL.md
├── tool_focused/
│   ├── SKILL.md
│   └── references/
│       └── api_spec.md
└── baseline/
    └── SKILL.md
```

To compare variants, run again with a different `skills.path` over the same dataset. The skills path is resolved like `input_jsonl_fpath` (relative paths check the working directory, then the Gym root). For distributed runs the directory must be on storage accessible to the agent process.

## Output: `skills_ref`

Each rollout result is stamped with a `skills_ref` for provenance and grouping during reward profiling:

```json
{
  "reward": 1.0,
  "skills_ref": {
    "path": "skills/variant_a/",
    "hash": "a1b2c3…",
    "skills": [{"name": "cot_enhanced", "description": "..."}]
  }
}
```

`hash` is a short content digest of the skill directory. It exists so that optimizer loops (e.g. ACE, GEPA, EvoSkill) that mutate a skill **in place** at the same path still produce distinguishable variants — identity is derived from bytes on disk, requiring no cooperation from the optimizer. Identical content yields an identical hash, so re-testing a prior variant automatically pools with its earlier evaluation.

For concurrent candidate evaluation (population search), write each candidate to its own directory (`skills/cand-0/`, `skills/cand-1/`, …) to avoid a path-reuse read/write race.

## How skills reach the agent

NeMo Gym handles everything agent-agnostic: it resolves `skills.path`, computes the `skills_ref`, stamps it onto each request, and propagates it to results. The agent's job is only to make the resolved directory discoverable by its native runtime — because *where* a runtime discovers skills is intrinsically agent-specific.

This splits cleanly into **shared core** and a **thin per-agent adapter**:

| Responsibility                                                                         | Owner                        |
| -------------------------------------------------------------------------------------- | ---------------------------- |
| `skills.path` config, load/validate, content hash, `skills_ref` stamping + propagation | NeMo Gym core (shared)       |
| Read `skills_ref.path` from the request                                                | Per-agent (identical shape)  |
| Stage the directory into the runtime's discovery location                              | Per-agent (location differs) |
| Enable native discovery; load/activate skills                                          | The agent's native runtime   |

## Adding skills support to an agent

To support skills in a new agent, implement the three-step adapter. NeMo Gym provides the shared utilities in `nemo_gym.skills`.

1. **Read the skills path from the request.** NeMo Gym stamps `skills_ref` (with `path`) onto the `/run` request body. Read `skills_ref["path"]` in your agent's `run()`.

2. **Stage the directory where your runtime discovers skills**, using `nemo_gym.skills.stage_skills(path, dest_dir)`. The destination is agent-specific:

   | Agent            | Discovery location                                         |
   | ---------------- | ---------------------------------------------------------- |
   | Claude Code      | `CLAUDE_CONFIG_DIR/skills/`                                |
   | Codex CLI        | the Codex config home (e.g. `$CODEX_HOME`)                 |
   | MCP-backed agent | configure the MCP skill server to serve from the directory |

   Prefer staging into a **per-request** ephemeral location so concurrent requests with different skills do not contaminate one another, and so nothing leaks between rollouts.

3. **Enable native discovery.** If your runtime disables discovery by default (Claude Code runs `--bare`), turn it on when skills are present. Then the agent's native loader handles selection and activation — NeMo Gym adds no skill-interpretation logic.

   Note that enabling discovery is usually all-or-nothing: with Claude Code, dropping `--bare` so skills load also re-enables *all* other auto-discovery — hooks, plugins, MCP servers, memory, and `CLAUDE.md` — not just skills. If you are measuring a skill's isolated impact, be aware that a baseline (`--bare`, no skills) versus a skills run may differ by more than the skill content alone.

The reference implementation is the [Claude Code agent](https://github.com/NVIDIA-NeMo/Gym/tree/main/responses_api_agents/claude_code_agent). It stages skills into a fresh per-request `CLAUDE_CONFIG_DIR/skills/` and drops `--bare` when skills are active.

## What NeMo Gym does not do

* It does not parse, select, inject, or activate skills — the agent's native runtime does all of that.
* It does not add skill-loading capabilities to agents that lack them. Injecting skill content into the prompt for such agents is a possible future enhancement, not part of this design.