Migrating to TAO 7.0#
From the TAO CLI to Agent Prompts#
Migration guide for the API-less, skill-bank-driven workflow.
Plugin: tao-skills (models/, data/, platform/, applications/).
1. What Changed#
Previous TAO releases shipped a hosted Fine-Tuning Microservice (FTMS) plus
the nvidia-tao-client package, which provided a TAO CLI and a Python SDK.
Both talked to a REST API that hosted workspaces, datasets, jobs, and
inference microservices as server-side state.
This release removes the API surface. There is no FTMS server, no central
database, and no tao login to authenticate against. Instead, you load the
tao-skills plugin in an agent (Claude Code or any compatible coding agent)
and ask the agent in plain English. The agent reads the relevant skill
SKILL.md files, builds the right command, and runs it on your local
Docker daemon — or on SLURM, Kubernetes, or Brev if you ask for a
remote platform.
Concretely, three things go away:
The REST API. There is no
https://<host>/api/v2/...to point clients at. Every CLI verb is now an agent prompt that invokes a skill directly.Server-side state (workspaces, datasets, jobs as DB rows). You manage your own cloud paths and your own local artifact directories. The agent helps you keep track inside a session, but nothing is persisted on a server.
UUIDs.
WORKSPACE_ID,DATASET_ID,JOB_IDexported as shell variables are replaced by natural references (“the training run I just kicked off”, “the checkpoint at./runs/dino/”).
What stays the same: the model containers (DINO, CLIP, Visual ChangeNet, etc.), the AutoML algorithms, the dataset formats. Only the orchestration layer changed — from a REST service to a skill-bank-driven agent.
2. Quick Start#
Replace your CLI install with the agent plus the skill plugin.
Before#
pip install nvidia-tao-client
tao login --ngc-key $NGC_KEY --ngc-org-name $NGC_ORG
tao --help
After#
# In a Claude Code session:
/plugin marketplace add https://github.com/NVIDIA-TAO/tao-skills-bank.git#7.0.1
/plugin install tao-skills@tao-skill-bank
# Export credentials in your shell BEFORE launching the agent.
# Use your shell's secret-loading mechanism of choice — for example a
# password manager that prints to stdout, your OS keychain, or a CI-injected
# environment. Do NOT write these values to a file on disk.
export NGC_KEY="$(security find-generic-password -s NGC_KEY -w)"
export NGC_ORG="my-org"
export ACCESS_KEY="$(op read 'op://Private/AWS/access_key')"
export SECRET_KEY="$(op read 'op://Private/AWS/secret_key')"
export S3_BUCKET_NAME="my-bucket"
export S3_ENDPOINT_URL="https://s3.us-west-1.amazonaws.com"
# Now launch the agent so it inherits the exported env.
claude
# Verify your GPU host is ready (only needed for the local-docker platform):
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Note
Security policy. Set every credential as an environment variable in
the shell session BEFORE launching the agent. The agent never reads
credential values directly; the model containers pick them up from the
inherited environment at docker run time. DO NOT write secrets to a
file on disk — files are leaky (they end up in shell history, backups,
Spotlight indexes, and accidental commits). Use your password manager,
OS keychain, or a CI secret injector to materialise the value into the
shell env at launch time, and let it disappear when the session ends.
3. Required Environment Variables#
All TAO credentials and tunables are read from the shell environment that launched the agent. The agent never reads a credentials file from disk, and neither do the containers it spawns — every value below must be exported in your shell BEFORE you start the agent, and the agent must be launched from that same shell so the values are inherited.
Note
Why env vars, not files. Files are leaky: shell history, periodic
backups, Spotlight / Windows Search indexes, and accidental git add .
commits all silently capture them. Keep secrets in the shell process
only, populated at launch time from your password manager, your OS
keychain, or a CI secret injector. When the shell exits, the secret is
gone.
3.1 The Complete Variable List#
NGC (required for every TAO container pull from nvcr.io)
Variable |
Purpose |
Where to get it |
Required? |
|---|---|---|---|
|
NGC personal API key — used as the docker login password for |
ngc.nvidia.com → Setup → Generate API Key. |
Yes |
|
NGC organization slug. Containers pull from |
Your NGC org name (e.g. |
Yes |
|
Optional team scope for model publishing. |
Your NGC team slug. |
Only for publish-model |
S3 / S3-compatible object storage (required when datasets live on AWS or similar)
Variable |
Purpose |
Where to get it |
Required? |
|---|---|---|---|
|
S3 access key id. Plumbed into containers as |
AWS console → IAM, or your S3-compatible provider. |
Yes if using |
|
S3 secret key. Plumbed into containers as |
Same provider as |
Yes if using |
|
Default bucket the agent assumes for |
Your bucket name. |
Recommended |
|
S3 endpoint URL. Required for non-AWS S3-compatible storage (MinIO, Wasabi, NVCF storage, …); leave unset for vanilla AWS. |
Your provider’s endpoint. |
If non-AWS S3 |
|
Region for AWS S3 operations (e.g. |
Your bucket region. |
Optional |
Azure Blob Storage (only if you use azure:// URIs)
Variable |
Purpose |
Where to get it |
Required? |
|---|---|---|---|
|
Azure storage account name. |
Azure portal → Storage account. |
Yes if using |
|
Storage account access key. Prefer Azure CLI auth or a SAS token where possible; key-based auth is the simplest fallback. |
Azure portal → Storage account → Access keys. |
Yes if using |
HuggingFace (only if pulling private models / gated checkpoints)
Variable |
Purpose |
Where to get it |
Required? |
|---|---|---|---|
|
HuggingFace access token. Aliased as |
huggingface.co → Settings → Access Tokens. |
Only for HF models |
Remote platform: Kubernetes (only if --platform=kubernetes)
Variable |
Purpose |
Where to get it |
Required? |
|---|---|---|---|
|
Path to your kubeconfig (not a secret itself; the file it points at is). The context selected must target a cluster with the NVIDIA GPU Operator installed. |
Provided by your cluster admin. |
Yes for k8s |
Remote platform: SLURM (only if --platform=slurm)
Variable |
Purpose |
Where to get it |
Required? |
|---|---|---|---|
(SSH agent) |
SLURM uses |
Your existing SSH key. |
Yes for slurm |
AutoML LLM brain (only for algorithm in {llm, hybrid, autoresearch})
Variable |
Purpose |
Where to get it |
Required? |
|---|---|---|---|
|
OpenAI-compatible endpoint URL. Default |
Your NIM / OpenAI / vLLM endpoint. |
Yes for LLM AutoML |
|
LLM model name passed to the endpoint. |
Endpoint’s model registry (e.g. |
Yes for LLM AutoML |
|
Bearer token for the LLM endpoint. |
Your provider — NVIDIA NIM key, OpenAI key, etc. |
Yes for LLM AutoML |
|
Fallback when |
build.nvidia.com → Get API Key. |
Optional fallback |
Observability (optional but recommended for sweeps)
Variable |
Purpose |
Where to get it |
Required? |
|---|---|---|---|
|
Weights & Biases API key. Without it, WandB tracking is silently disabled. |
wandb.ai → Settings. |
Only for WandB |
Configuration (not secrets, but required for some flows)
Variable |
Purpose |
Where to get it |
Required? |
|---|---|---|---|
|
Filesystem path to the skill bank. The session-start hook sets this; export yourself only if you script your own runner outside the agent. |
Usually |
Auto-set |
4. The New Mental Model#
Every CLI verb maps to one or more skills under ~/tao-skills-bank/.
Knowing which skill the agent reaches for makes the prompts much easier to
write:
Skill layer |
What it owns |
|---|---|
|
Container image, per-action command, accepted dataset format, required dataset URIs, spec template, AutoML notes. One per network (dino, segformer, clip, …). |
|
Standard fine-tune workflow: train → eval → export, composing the model and platform skills. |
|
Hyperparameter optimization (bayesian, hyperband, ASHA, LLM-guided, autoresearch). Drives the model skill repeatedly. |
|
Default backend: actual |
|
Remote backends: same container, different launcher. Switch by asking the agent for a different platform. |
|
Data preparation skills: kNN mining, embedding generation, captioning, AOI mining, anomaly generation, KPI analysis. Cover what |
5. How to Read Each Entry#
Sections 6–14 each contain a three-column table. The left column is the CLI you used to type. The middle column is the agent prompt that does the same thing — say it in your own words; this is just a concrete starting point. The right column lists the skills the agent will reach for so you know where to look if something needs tweaking.
Note
Legend. Rows tagged (removed) document CLI verbs that have no
equivalent in the new workflow — they managed FTMS-only state that no
longer exists. Section 15 collects all of them in one place.
6. Authentication, Workspaces, and Datasets#
Without a server, there is no login, no workspace, no dataset registry. Authentication to NGC and to your cloud bucket lives entirely in your shell environment — every credential is exported in the shell that launches the agent and inherited by every container the agent spawns. “Workspace” collapses to your cloud bucket URI prefix; “dataset” collapses to a path inside that prefix.
Note
Reminder. Secrets must never be written to a file. Export
NGC_KEY, NGC_ORG, ACCESS_KEY, SECRET_KEY,
S3_BUCKET_NAME, S3_ENDPOINT_URL, HF_TOKEN, WANDB_API_KEY,
AZURE_*, etc. in the shell before launching the
agent. If a row below tells you to “set X”, that always means
export X=... in your shell, not editing a file.
CLI command |
Agent prompt |
Skills used |
|---|---|---|
|
“Are my NGC credentials exported in this shell?” Agent runs |
(shell env) |
|
“Unset my TAO credentials in this shell.” Agent emits |
(shell env) |
|
“Which TAO credentials does the agent see?” Reports |
(shell env) |
|
“What version of the skill bank is loaded?” Reports skill-bank git SHA and the TAO container image versions pinned in |
( |
|
(removed) No server-side workspaces exist. Export |
— |
|
(removed) Same story — export |
— |
|
(removed) No server-side state to list. |
— |
|
(removed) No server-side state. |
— |
|
(removed) No server-side state to update. |
— |
|
(removed) No server-side state to delete. |
— |
|
(removed) There is no workspace DB. Back up your local artifact dir and cloud bucket the usual way. |
— |
|
(removed) Datasets are no longer registered server-side. Place data at |
— |
|
(removed) Use |
— |
|
(removed) Use the cloud CLI — |
— |
|
(removed) Same: cloud-side ops via |
— |
7. Inspecting Models, Schemas, and AutoML Defaults#
The CLI hit a REST endpoint to learn what each network supports. The agent
reads the same information directly from the model skill’s SKILL.md and
references/skill_info.yaml — no network call.
CLI command |
Agent prompt |
Skills used |
|---|---|---|
|
“Which pretrained checkpoints (PTMs) does $N support?” Agent reads |
|
|
“Show me $N’s default training spec.” Reads |
|
|
“What AutoML hyperparameters does $N expose by default?” Combines the model SKILL’s “AutoML / HPO Notes” with the param generator described in |
|
|
“Explain the AutoML search ranges for epochs and batch_size on $N.” |
|
|
“What GPUs are visible to my platform?” For local-docker: |
|
8. Training and the Experiment Action Chain#
tao create-job --kind experiment was the single biggest CLI surface. The
agent replaces it with applications/tao-train-single-step for one-off runs and
applications/tao-run-automl for sweeps, both composing the model and platform
skills. Action chaining via --parent-job-id becomes “now do the next step
on the artifacts you just produced.”
CLI command |
Agent prompt |
Skills used |
|---|---|---|
|
“Train $N on s3://bucket/train, eval against s3://bucket/val, starting from the default PTM, overrides: epochs=10, batch_size=4, num_classes=5.” Agent fetches the schema from the model SKILL, applies your overrides, runs the container, streams logs to your session. |
|
|
“Now evaluate the checkpoint we just trained against the val set.” The agent finds the most recent train artifact dir in this session and points |
|
|
“Prune the trained $N model to 50% of channels.” Same pattern: parent-checkpoint resolved from session context. |
|
|
“Retrain the pruned model.” |
|
|
“Distill the trained model into a smaller backbone.” |
|
|
“Quantize the trained model.” |
|
|
“Export the trained model to ONNX.” |
|
|
“Build a TensorRT engine from the ONNX we just exported.” |
|
|
“Run TensorRT inference on the test set.” |
|
|
“Run MAL auto-labeling on the unlabeled dataset using the trained model.” |
|
9. AutoML#
tao create-job --automl-settings @automl.json routed through the FTMS
server. The agent now drives the same AutoMLRunner directly through
applications/tao-run-automl.
CLI command |
Agent prompt |
Skills used |
|---|---|---|
|
“Show me the AutoML defaults for $N and save them.” |
|
|
“Run AutoML on $N with the bayesian algorithm for 20 trials, optimizing val_mAP50.” Algorithm options: bayesian, hyperband, asha, bohb, dehb, pbt, hyperband_es, llm, hybrid, autoresearch. Ask the agent which one fits your budget. |
|
(implicit in CLI — was hidden behind |
“Use the LLM-guided AutoML algorithm with NVIDIA NIM as the brain.” LLM algorithms read |
|
(implicit — set via |
“Track this AutoML sweep in Weights & Biases under project ‘tao-hpo’.” |
|
10. Dataset Preparation Jobs#
tao create-job --kind dataset (and the data_services notebook
commands) handled format conversion, image validation, augmentation,
captioning, and similar pre-training prep. These split across two places
now: the model skill’s dataset_convert action (when it’s
network-specific) and the data/* skill family.
CLI command |
Agent prompt |
Skills used |
|---|---|---|
|
“Convert my raw KITTI data into TFRecords for $N.” The model SKILL’s |
|
|
“Augment my training images (rotate, brightness, blur).” Use the data-services-style flow if no model-specific augment action exists. |
|
|
“Validate my training images and remove the corrupted ones.” |
|
|
“Convert my KITTI annotations to COCO.” |
|
|
“Analyze the class distribution and image stats of my dataset.” |
agent-scripted (no published skill) |
|
“Auto-label this unlabeled image folder using MAL.” |
|
(no direct CLI verb) |
“Mine the nearest neighbors of these query images in my unlabeled pool.” |
|
11. Monitoring Runs and Downloading Artifacts#
Without an API there is no job DB to query. Instead, the agent inspects whatever the platform skill manages: containers on the local Docker daemon, jobs in your SLURM queue, or pods on your Kubernetes cluster.
CLI command |
Agent prompt |
Skills used |
|---|---|---|
|
“What TAO jobs are currently running?” Local-docker: |
|
|
“Is my training run still going?” The agent recognises the run by name or by the latest container if you don’t name it. |
|
|
“Show me the full details of the training run.” Combines |
|
|
“Tail the logs of my training run.” Local: |
|
|
“Cancel the training run.” Local: |
|
|
“Pause my training run … now resume it.” Practical only on local-docker ( |
|
|
“What files did the training run produce?” Agent lists the run’s artifact directory (usually |
(filesystem) |
|
“Download all artifacts from my training run to ./out.” Local-docker: just copy the artifact dir. Remote: agent calls the platform skill’s sync verb ( |
(cloud CLI or filesystem) |
|
“Download only the best checkpoint and the spec from my training run.” |
(cloud CLI or filesystem) |
|
(removed) No central job DB to update. Use file-system tags or your own bookkeeping. |
— |
|
“Delete the training run.” No central job DB; the agent invokes the platform skill’s native cleanup. Local Docker: |
|
12. Inference Serving (Microservices)#
The FTMS inference-microservice surface is replaced by the model skill’s
inference action running in serving mode on the chosen platform.
CLI command |
Agent prompt |
Skills used |
|---|---|---|
|
“Serve the $N model from the last training run on port 8080.” Agent reads the model SKILL’s inference action, mounts the trained checkpoint, runs the container in detached mode with a port-forward. |
|
|
“Send this prompt to the $N inference server: ‘a cat in a hat’.” Agent constructs the LLM / VLM / diffusion request body that matches the container’s sidecar API and calls it via |
(direct HTTP to the container) |
|
“Is the $N inference server still up?” Local: |
|
|
“Stop the $N inference server.” Local: |
|
13. Publishing Models#
Publishing was the FTMS endpoint that pushed a trained checkpoint to NGC.
With no API in the loop you push via the NGC CLI or docker push directly.
The agent can help construct the commands but does not call them itself.
CLI command |
Agent prompt |
Skills used |
|---|---|---|
|
“Help me publish my trained $N model to NGC team ‘myteam’ as ‘DINO v1’.” Agent emits the |
(ngc CLI / |
|
“Help me unpublish $JOB from team ‘myteam’.” |
(ngc CLI) |
14. End-to-End Example#
Compare a typical object-detection pipeline (train → eval → export → TRT → inference) in both forms.
Before (CLI + REST API)#
tao login --ngc-key $NGC_KEY --ngc-org-name $NGC_ORG
WS=$(tao dino create-workspace-aws --name WS --cloud-region us-west-1 \
--cloud-bucket-name mybucket --access-key $AK --secret-key $SK \
--output json | jq -r .id)
TR=$(tao dino create-dataset --dataset-type object_detection \
--dataset-format coco --workspace-id $WS --cloud-file-path /data/train \
--use-for training --output json | jq -r .id)
EV=$(tao dino create-dataset --dataset-type object_detection \
--dataset-format coco --workspace-id $WS --cloud-file-path /data/val \
--use-for evaluation --output json | jq -r .id)
# wait for pull_complete on both...
PTM=$(tao dino list-base-experiments --filter-param network_arch=dino \
--output json | jq -r '.[0].id')
tao dino get-job-schema --action train --base-experiment-id $PTM \
--output @train.yaml
# hand-edit train.yaml...
JOB_TR=$(tao dino create-job --kind experiment --action train \
--encryption-key tlt_encode --workspace-id $WS \
--base-experiment-id $PTM --train-dataset $TR --eval-dataset $EV \
--specs @train.yaml --output json | jq -r .id)
# poll get-job-status...
tao dino create-job --kind experiment --action evaluate \
--parent-job-id $JOB_TR --eval-dataset $EV --specs @eval.yaml
tao dino create-job --kind experiment --action export \
--parent-job-id $JOB_TR --specs @export.yaml
tao dino create-job --kind experiment --action gen_trt_engine \
--parent-job-id $JOB_EXP --specs @trt.yaml
tao dino create-job --kind experiment --action inference \
--parent-job-id $JOB_TRT --specs @infer.yaml
After (single agent prompt)#
“Fine-tune DINO on the COCO data at
s3://mybucket/data/train, evaluating againsts3://mybucket/data/val. Use the default PTM. Override epochs to 10, batch_size to 4, num_classes to 5. When training finishes, evaluate, export to ONNX, build a TensorRT engine, and run TRT inference on the test images ats3://mybucket/data/test. Run everything on the local Docker daemon.”
The agent will: confirm the GPU host is ready
(docker run --runtime=nvidia --gpus all ubuntu nvidia-smi), read
models/tao-train-dino/SKILL.md and applications/tao-train-single-step/SKILL.md, build
the train spec from the model’s spec template overlaid with your three
values, run the DINO container with --gpus all and the S3 creds plumbed
in, stream the logs, then chain evaluate → export → gen_trt_engine →
inference using the previous step’s artifact path each time. Artifacts land
in ./runs/dino-<timestamp>/.
15. CLI Commands with No Equivalent#
These CLI verbs managed FTMS server state that no longer exists. None of them have a direct prompt equivalent — instead, the underlying need is met by the cloud (aws/azure/gcp CLIs), the local filesystem, the Docker daemon, or simply by not needing the abstraction any more.
CLI verb |
Why it goes away / what to do instead |
|---|---|
|
No workspace registry. Export cloud credentials ( |
|
No workspace registry. Cloud-side operations via |
|
No workspace DB to back up. Snapshot your local artifact dir and cloud bucket the usual way. |
|
No dataset registry. Cloud paths replace dataset IDs; cloud CLIs replace listing/inspection. |
|
No job DB. Tag artifacts on the filesystem or in your own notes. |
|
No central job DB. The agent invokes the platform skill’s native cleanup: Local Docker |
16. Gotchas During Migration#
Identifiers are paths, not UUIDs. Where you used to script
JOB_ID=$(...)and pass it through--parent-job-id, the agent works with artifact directories and container names. Name your runs (“train DINO; tag the run dino-smoke-01”) and the agent will use that name as the artifact-dir suffix and the container name.State lives where the work runs. Local-docker:
./runs/<name>/. SLURM: under$SLURM_SUBMIT_DIR. Kubernetes: on the cloud bucket the job wrote to. Don’t expect a single dashboard — ask the agent to list runs on a given platform.Concurrent training is up to you. The FTMS server serialised jobs through its queue. Now nothing prevents you from kicking off two trainings at once on the same GPU — watch your memory.
AutoML still wants
TAO_SKILL_BANK_PATH.applications/tao-run-automlreads model skills from the bank; ifTAO_SKILL_BANK_PATHisn’t set, it errors with “No skill config found.” The session-start hook sets it automatically; if you script your own runner, export it yourself.AutoML LLM endpoints don’t auto-default. For
llm,hybrid, andautoresearchalgorithms, the agent will prompt you forllm_endpoint,llm_model, andllm_api_keybefore launching. Export them as environment variables (AUTOML_LLM_ENDPOINT,AUTOML_LLM_MODEL,AUTOML_LLM_API_KEY) before launching the agent if you want to skip the prompt — do not save them to a file.Remote platforms still need their preflight. SLURM needs an SSH-reachable head node and your SSH agent loaded. Kubernetes needs a kubeconfig context with the GPU Operator installed. Brev needs
BREV_API_TOKENexported in your shell. The agent checks env-var presence before launching and will tell you exactly what is missing.Secrets in env vars, never in files. All TAO secrets —
NGC_KEY,ACCESS_KEY,SECRET_KEY,HF_TOKEN,WANDB_API_KEY,AUTOML_LLM_API_KEY, etc. — must be exported in your shell before you launch the agent. Files on disk are leaky: they end up in shell history, in backups, in Spotlight / Windows Search indexes, and occasionally in accidentalgit add .commits. If you need a way to materialise a secret into the shell at launch time, use your password manager’s CLI (1Passwordop, Bitwardenbw, Keychainsecurity find-generic-password, AWS SSM / Secrets Manager, HashiCorp Vault). Let the value live only in the shell process — when the shell exits, the secret is gone.Logs are platform-native.
tao get-job-logsused to return a single text blob via REST. Now the agent runsdocker logs/tail -f slurm-<id>.out/kubectl logsagainst your local shell — the output stays where it natively lives.Spec files still matter. You can either describe overrides in prose (“epochs=10, batch_size=4”) or point the agent at an existing YAML (“use the spec at
./train.yaml”). Both are merged onto the model SKILL’s default template before running.
17. Reference#
Skill bank root:
~/tao-skills-bank/(cloned by thetao-skillsplugin).Per-network skills:
models/<network>/SKILL.md+references/skill_info.yaml+references/spec_template_<action>.yaml.Standard fine-tune workflow:
applications/tao-train-single-step/SKILL.md.Hyperparameter optimization:
applications/tao-run-automl/SKILL.md.Local Docker conventions:
platform/tao-run-on-local-docker/SKILL.md.Remote platforms:
platform/tao-run-on-slurm/,platform/tao-run-on-kubernetes/,platform/tao-run-on-brev/.Data preparation skills:
data/*(tao-mine-aoi-images,tao-analyze-gaps-visual-changenet,tao-route-visual-changenet-samples,tao-generate-image-grounding,tao-generate-referring-expressions,tao-generate-video-reasoning-annotations).Credentials: export
NGC_KEY,NGC_ORG,ACCESS_KEY,SECRET_KEY,S3_BUCKET_NAME,S3_ENDPOINT_URL,HF_TOKEN,WANDB_API_KEY,AUTOML_LLM_API_KEY, etc. as environment variables in the shell that launches the agent — never write them to a file on disk. The.env.exampleshipped with the skill bank documents variable NAMES only; treat it as a checklist, not a template to copy.Image and SDK version pins:
~/tao-skills-bank/versions.yaml.