Getting Started with NVIDIA TAO Toolkit#

TAO Toolkit is now agent-based. The product surface — every model, every data transform, every platform, every workflow — is published as the TAO Skill Bank, a plugin you install into your coding agent (Claude Code, Codex, …). Once installed, you talk to your agent in plain English and it discovers, plans, and executes TAO workloads on your behalf using the TAO Execution SDK.

You do not learn a CLI. You do not write spec YAML by hand. You ask, the agent proposes, you confirm, and the job runs on your chosen compute backend.

Prerequisites#

Before you drive the agent, it helps to know a little about what you are asking it to do. None of this is strictly mandatory — the agent will tell you what is missing if you jump in cold — but you will get to a first successful run faster if you have a sense of:

  • The model categories relevant to your problem: image classification, object detection, instance or semantic segmentation, change detection, pose estimation, OCR, and so on.

  • The basics of hyperparameter search and AutoML, if you plan to use the tao-run-automl workflow.

  • The dataset format the model you pick expects (the agent reads the model’s SKILL.md and will confirm format requirements, but knowing roughly what it wants — COCO JSON, KITTI, image pairs with a CSV manifest, etc. — short- circuits a lot of back-and-forth).

  • The compute environment you have access to: local Docker, a remote SSH- reachable Docker host, SLURM, Kubernetes, or NVIDIA Brev.

Hardware requirements:

  • An NVIDIA GPU with Compute Capability 7.0 or newer (Volta or later); 16 GB VRAM minimum, 24 GB or more recommended for foundation-model fine-tuning.

  • NVIDIA driver from the 580 branch or newer; CUDA Toolkit 13.0; Docker 24+; NVIDIA Container Toolkit 1.19.0 or newer.

  • A Linux host (Ubuntu 22.04 or 24.04) or a remote Docker daemon reachable via DOCKER_HOST. macOS hosts can run the agent itself but cannot run GPU containers locally — use a remote Docker host or one of the managed backends (Brev, SLURM, Kubernetes).

If anything in this list is unfamiliar, do not stall on it — install the plugin and ask the agent. It will surface missing prerequisites as it plans the job.

1. Install the TAO Skill Bank Plugin#

The TAO Skill Bank ships as a marketplace plugin. Pick the runtime you use.

Claude Code#

In a Claude Code session, run these two slash commands at the prompt:

/plugin marketplace add https://github.com/NVIDIA-TAO/tao-skills-bank.git#7.0.1
/plugin install tao-skills@tao-skill-bank

The tao-skills plugin bundles every published TAO skill — models, data, platform, and application workflows including the DEFT AOI improvement loop. There is no longer a separate DEFT plugin.

Note

Slash commands are typed by you at the Claude Code prompt — the agent itself cannot type slash commands. This is the only step where the install lives in your hands.

Codex CLI#

Codex is a terminal-based coding agent from OpenAI. If you have not installed it yet, refer to the Codex CLI installation instructions. With Codex installed, add the TAO skill bank:

codex plugin marketplace add https://github.com/NVIDIA-TAO/tao-skills-bank.git --ref 7.0.1
codex                    # opens the TUI
/plugins                 # select TAO Skill Bank → Install plugin

Codex caches the plugin at ~/.codex/plugins/cache/tao-skill-bank/tao-skill-bank/<version>/.

Note

Installing from a downloaded archive. If you have the skill bank as a local .zip rather than a Git ref, extract it and point the marketplace at the absolute path:

unzip ~/Downloads/tao-skills-bank-7.0.1.zip -d ~/tao-skills-bank
codex plugin marketplace add ~/tao-skills-bank
codex
/plugins                 # select TAO Skill Bank → Install plugin

2. Activate the Plugin in Your Current Session#

In Claude Code, the plugin is registered but not yet active after /plugin install. Run:

/reload-plugins

This makes the skills, slash commands, and hooks visible to the agent without exiting the session. /reload-plugins prints counts of what it loaded — confirm the tao-skills entry shows skills loaded.

Note

Codex auto-activates the plugin the next time you launch codex; there is no Codex equivalent of /reload-plugins to run.

3. Export the Environment Variables You Need#

The TAO Execution SDK reads credentials and platform identifiers from the shell environment of the process that calls it — i.e. the agent process. This means the variables must be exported before you start (or restart) your agent, in the same shell.

Variables by Category#

Always required:

Variable

What it is

NGC_KEY

NVIDIA NGC API key from ngc.nvidia.com. Used by every backend to pull TAO container images from nvcr.io.

HF_TOKEN

HuggingFace user access token from huggingface.co/settings/tokens. Required for gated models (Cosmos-RL, Cosmos-Embed, Visual ChangeNet, VILA, HuggingFace fine-tune).

One set per backend — export only the rows for the backend you intend to use:

Backend

Variables

NVIDIA Brev

BREV_API_TOKEN

SLURM cluster

SLURM_USER, SLURM_HOSTNAME (comma-separated permitted for failover), SSH_KEY_PATH (optional, defaults to ~/.ssh/id_ed25519 )

Kubernetes

Configured via your kubeconfig; no extra env vars

Remote Docker

DOCKER_HOST, NGC_KEY; optionally DOCKER_TLS_VERIFY + DOCKER_CERT_PATH for TLS-protected daemons

Local Docker

None (DockerSDK uses the local daemon directly)

Optional, by feature:

Variable

When you need it

ACCESS_KEY, SECRET_KEY

Any workflow with s3:// dataset or result URIs

S3_ENDPOINT_URL

Non-AWS S3 (MinIO, Cloudflare R2, …)

S3_BUCKET_NAME, CLOUD_REGION

Default output destination + AWS-style region

WANDB_API_KEY, WANDB_PROJECT

AutoML tracking, HuggingFace fine-tune

NVIDIA_API_KEY

AutoML algorithms llm, hybrid, autoresearch

GEMINI_API_KEY or OPENAI_API_KEY (+ OPENAI_BASE_URL)

Data skills that call a VLM: image-grounding, referring-expression, video-reasoning-annotation

Note

For the Remote Docker backend, the agent runs docker --host "$DOCKER_HOST" info as a preflight check to verify the remote daemon is reachable before dispatching any job.

Note

Each platform skill (platform/<backend>/SKILL.md) has a Preflight section that lists the exact variables it expects. The agent reads this when you pick a backend and tells you what’s missing — you don’t need to memorise the table.

How to Export Them#

The mechanic is always the same three-step sequence:

  1. Exit your agent. From a running Claude Code session: /exit or Ctrl-D.

  2. Export the variables in the same shell. Replace each <...> placeholder with your actual value:

    # Always
    export NGC_KEY="<ngc-api-key>"
    export HF_TOKEN="<hf-token>"
    
    # Pick the block for your backend
    export BREV_API_TOKEN="<brev-api-token>"
    # — or —
    export SLURM_USER="<cluster-username>"
    export SLURM_HOSTNAME="<login-host>[,<failover-login-host>]"
    export SSH_KEY_PATH="$HOME/.ssh/id_ed25519"
    # — or — Remote Docker host (SSH-tunnelled or TCP)
    export DOCKER_HOST="ssh://user@remote-gpu-host"
    # — or —
    export DOCKER_HOST="tcp://remote-gpu-host:2376"
    export DOCKER_TLS_VERIFY=1
    export DOCKER_CERT_PATH="$HOME/.docker/certs/remote-gpu-host"
    # — or — Local Docker (no env var needed; the default)
    
    # Optional, by feature
    export ACCESS_KEY="<s3-access-key>"
    export SECRET_KEY="<s3-secret-key>"
    export WANDB_API_KEY="<wandb-api-key>"
    
  3. Restart the agent in the same shell. For Claude Code, resume your previous conversation in place:

    claude --resume
    

Secrets are never written to a file — not .env, not your shell rc file, not a notebook cell. Files leak (backups, sync clients, version control, screen-shares, log scrapers). The shell environment goes away when the shell exits, which is what you want. New shells do not inherit exports from other shells — if you open a fresh terminal later, re-run the export commands there too before starting another agent.

Note

The skill bank’s bundled README and AGENTS.md describe a .env convention because some users come from containerised-CI workflows where a .env is the natural input. The guidance above is the recommended user-facing path: export variables in the shell that launches the agent. If you do choose a .env for convenience, ensure it is gitignored and never committed.

4. Verify Everything Is Wired Up#

Inside the restarted agent session, ask:

“Confirm the TAO Skill Bank is installed and tell me which platform SDKs you can dispatch to right now.”

The agent should respond with the version of the tao-skills plugin and the list of backends whose environment variables it can see (e.g. “SLURM: not configured; Brev: not configured; Local Docker: available; HF_TOKEN: present; NGC_KEY: present”). If any required variable is missing for the backend you intend to use, the agent will tell you what’s missing — fix it via the exit → export → restart cycle above.

Note

This verification is an echo-style presence check, not an inference or API call. The agent confirms that a variable is set in its inherited shell environment; it does not call HuggingFace, NGC, Brev, or any cloud endpoint to validate the value. HF_TOKEN: present only means the variable is non-empty — it does not guarantee the token has access to a specific gated repository. Token validity is exercised the first time a model skill pulls the gated checkpoint, which is when an invalid or under-permissioned token surfaces as a download failure.

You can also ask the agent which blockers it can resolve for you before you submit a real job:

“What blockers can the skill bank handle for me?”

The agent enumerates the readiness checks each skill performs (NGC docker login, GPU visibility, kubeconfig resolution, S3 endpoint reachability, dataset URI existence, container image pull, and similar preflight gates), and reports which of them it can fix automatically vs. which require a manual action from you.

Lightweight Exploration Prompts#

These prompts do not submit any job — they only ask the agent to enumerate or describe what the skill bank can do. Useful for getting your bearings:

  • “What networks are supported for object detection?”

  • “What networks are supported for semantic segmentation?”

  • “List every TAO model, data skill, and application workflow you can run.”

  • “What can the tao-train-visual-changenet model skill do, and what does its training input look like?”

  • “Which platform skills do I have credentials for right now?”

If an answer mentions a skill, action, or kwarg you want to inspect yourself, ask the agent to cat the matching SKILL.md file out of the plugin cache.

5. Initial Prompts to Get Started#

You drive the agent in natural language. Be concrete about the dataset URI, the metric you care about, and the compute backend you want to use — the agent will ask if you leave any of those open, but volunteering them up front shortens the back-and-forth.

Try One of the Smaller Workflows First — Visual ChangeNet Inference#

If you want to see the agent in action without committing to a full training run, the Visual ChangeNet model skill is the smallest end-to-end loop in the bank. It is also the model that drives the DEFT AOI improvement loop example below, so exercising it standalone is a good warmup.

What Visual ChangeNet does. It compares an input image against a golden reference image and either:

  • Classify: emits a binary PASS / NO_PASS label per pair (siamese-style architecture with a shared C-RADIO ViT backbone and a learnable difference module). Use for AOI defect detection on PCB-style inputs.

  • Segment: emits a binary change mask over the pair (ViT-Large NVDINOv2 backbone). Use when you need pixel-level localisation of the defect, not just an image-level verdict.

Inputs. The model ingests image pairs through a 4-column CSV with one row per pair: input_path,golden_path,label,object_name. The input_path / golden_path columns are paths relative to your images_dir root; label is the ground-truth class (used only at training and evaluation time); object_name groups related parts. For inference-only runs, you can omit label.

Outputs. Classify writes per-pair predictions and confidences; Segment writes binary change masks alongside the inputs. Both write a metrics summary to the run’s results_dir.

Sample inference prompts:

“Run tao-train-visual-changenet classify inference on the pairs listed in /tmp/aoi/eval.csv . Images are under /tmp/aoi/images/ . Write results to /tmp/vcn-out/ . Run on the local Docker daemon.”

“Run tao-train-visual-changenet segment inference on the image pair /tmp/sample/input.png and /tmp/sample/golden.png . Write the change mask to /tmp/vcn-out/ .”

This exercises the same dispatch path as a full training job on your local Docker daemon (tao-run-on-local-docker), with no managed compute required. Refer to models/tao-train-visual-changenet/SKILL.md in the plugin cache for the full set of spec keys, the classify/segment dataset-format details, and additional example prompts.

See also: Visual ChangeNet documentation.

Run a DEFT AOI Improvement Loop (PCB inspection)#

“Run the DEFT loop on my workspace at ~/workspace/ (replace with your own PCB inspection dataset) with the KPI and train dataset there. KPI target: FAR < 10% at Recall = 100%, generate 20 SDG images per iteration, train epoch 20, and cap iterations at 1 for the first smoke test.”

Example prompt for running the DEFT loop and the agent's initial response

The agent loads the tao-run-deft-aoi skill, confirms auto mode, and begins pre-flight checks to resolve workspace inputs.

DEFT loop pre-flight stage output showing dataset, augmentation, and tool resolution

During the pre-flight stage, the agent resolves the training dataset, KPI CSV, augmentation configuration, container image, and Python virtual environment before submitting any jobs.

What the agent does:

  1. Reads the tao-run-deft-aoi application skill to plan the Evaluate → RCA → SDG → Mine → Retrain → Gate loop.

  2. Reads the tao-train-visual-changenet model skill for spec defaults and the tao-run-on-local-docker platform skill for dispatch conventions.

  3. Asks you for anything still missing (e.g. eval dataset URI, node group name, max iterations, image override if any).

  4. Shows you the planned sdk.create_job(...) for the first iteration and waits for your confirmation before submitting.

  5. Iterates until your KPI gate is met or the iteration cap is reached.

For deeper background, refer to Visual ChangeNet documentation.

Train a Model on Your Own Dataset#

“Fine-tune DINO object detection on the detection-datasets/coco_2017_val Hugging Face dataset (val split is fine for a smoke test, no auth required). Use 4 GPUs on the local Docker daemon. Cap training at 1 epoch so I can see end-to-end results within ~30 minutes.”

What the agent does:

  1. Reads applications/tao-train-single-step/SKILL.md for the single-step train → eval → (optional export) flow.

  2. Reads models/tao-train-dino/SKILL.md for DINO-specific spec keys (number of classes, monitoring metric, default checkpoint).

  3. Asks you for num_classes and any per-network spec keys it can’t infer.

  4. Shows you the resolved container image and the planned sdk.create_job(...) call, including the inputs={...} mapping from container paths to your dataset URI.

  5. Submits the training job on confirmation; monitors status and surfaces logs and failure analysis on demand.

See also: DINO model documentation.

Hyperparameter Optimization with AutoML#

“Run TAO AutoML on visual-changenet using the bayesian algorithm with 3 trials and 1 epoch per trial (smoke test, ~15 minutes). Dataset is detection-datasets/coco_2017_val . Track in WandB under project pcb-aoi-hpo . Run on local Docker.”

What the agent does:

  1. Reads applications/tao-run-automl/SKILL.md and confirms AutoMLRunner is the right entry point.

  2. Asks for the metric to optimize, the search space (or accepts a defaults-derived one), and the algorithm-specific kwargs.

  3. Verifies WANDB_API_KEY is exported (and NVIDIA_API_KEY too, if you pick the llm, hybrid, or autoresearch algorithm).

  4. Launches the AutoMLRunner pointing at your chosen platform SDK; reports trial-by-trial progress.

For deeper background, refer to AutoML documentation.

Where to Go from Here#

  • TAO Skill Bank source of truth: the applications/, models/, data/, platform/, and skills/ directories of the skill bank repository (installed by the plugin under ~/.claude/plugins/cache/tao-skill-bank/tao-skill-bank/<version>/). Every SKILL.md is a contract the agent reads when planning a workload — feel free to open one yourself if you want to know exactly what the agent is consulting.

  • Plugin maintenance: to pick up new skill versions, run /plugin marketplace update tao-skill-bank then /reload-plugins.

  • List what’s available: ask the agent “List every TAO model, data skill, and application workflow you can run.” It will enumerate the skill bank for you.

If the agent ever proposes a command that references an API, kwarg, or endpoint you can’t find in any SKILL.md, stop and ask it to cite the file it read. The skill bank is the source of truth — anything outside it is a hallucination.