SkyPilot + Kubernetes Tutorial#
This tutorial shows how to run NeMo AutoModel on a Kubernetes cluster through SkyPilot.
You will:
Check that SkyPilot can see your Kubernetes cluster and GPUs.
Launch a small NeMo AutoModel fine-tuning job on one GPU.
Scale the same job to two nodes.
Follow logs and clean everything up when you are done.
This guide is written for new AutoModel users, so it keeps the moving pieces as small as possible.
Before you begin#
You need:
a working Kubernetes context in
kubectlat least one GPU-backed node in the cluster
SkyPilot installed with Kubernetes support
a local NeMo AutoModel checkout
a Hugging Face token in
HF_TOKENif you plan to use a gated model such as Llama
If you are setting up SkyPilot on Kubernetes for the first time, the official SkyPilot Kubernetes setup guide is here:
Install the SkyPilot Kubernetes client in your AutoModel environment:
uv pip install "skypilot[kubernetes]"
Set the token once in your shell:
export HF_TOKEN=hf_your_token_here
Step 1: Verify the cluster#
Start with three quick checks:
kubectl config current-context
kubectl get nodes
sky check kubernetes
You want sky check kubernetes to report that Kubernetes is enabled.
Next, ask SkyPilot which GPUs it can request from the cluster:
sky show-gpus --infra k8s
Example output:
$ sky show-gpus --infra k8s
Kubernetes GPUs
GPU REQUESTABLE_QTY_PER_NODE UTILIZATION
L4 1, 2, 4 8 of 8 free
H100 1, 2, 4, 8 8 of 8 free
Kubernetes per node GPU availability
NODE GPU UTILIZATION
gpu-node-a H100 8 of 8 free
If you do not see any GPUs here, stop and fix the Kubernetes or SkyPilot setup first. AutoModel is ready, but SkyPilot still cannot place GPU jobs.
Step 2: Run a single-node job#
The easiest starting point is a one-GPU fine-tune using the existing Llama 3.2 1B SQuAD example.
This repository now includes a Kubernetes-flavored SkyPilot config at examples/llm_finetune/llama3_2/llama3_2_1b_squad_skypilot_kubernetes.yaml.
Launch it from the repo root:
automodel examples/llm_finetune/llama3_2/llama3_2_1b_squad_skypilot_kubernetes.yaml
The important part of that YAML is the skypilot: block:
skypilot:
cloud: kubernetes
accelerators: H100:1
use_spot: false
disk_size: 200
job_name: llama3-2-1b-k8s
hf_token: ${HF_TOKEN}
What AutoModel does for you:
writes a launcher-free copy of the training config to
skypilot_jobs/<timestamp>/job_config.yamlsyncs the repo to the SkyPilot workdir
runs
torchrunon the Kubernetes worker podforwards your training config unchanged after removing the
skypilot:section
Example submission output:
$ automodel examples/llm_finetune/llama3_2/llama3_2_1b_squad_skypilot_kubernetes.yaml
INFO Config: /workspace/Automodel/examples/llm_finetune/llama3_2/llama3_2_1b_squad_skypilot_kubernetes.yaml
INFO Recipe: nemo_automodel.recipes.llm.train_ft.TrainFinetuneRecipeForNextTokenPrediction
INFO Launching job via SkyPilot
INFO SkyPilot job artifacts in: /workspace/Automodel/skypilot_jobs/1712150400
Then watch the cluster come up:
sky status
sky logs llama3-2-1b-k8s
kubectl get pods
Example log snippet:
$ sky status
Clusters
NAME LAUNCHED RESOURCES STATUS
llama3-2-1b-k8s 1m ago 1x Kubernetes(H100:1) UP
$ sky logs llama3-2-1b-k8s
...
torchrun --nproc_per_node=1 ~/sky_workdir/nemo_automodel/recipes/llm/train_ft.py -c /tmp/automodel_job_config.yaml
...
Step 3: Scale to two nodes#
Once the single-node job works, scaling out is just a small YAML change.
Use the two-node example at examples/llm_finetune/llama3_2/llama3_2_1b_squad_skypilot_kubernetes_2nodes.yaml:
automodel examples/llm_finetune/llama3_2/llama3_2_1b_squad_skypilot_kubernetes_2nodes.yaml
The launcher block looks like this:
skypilot:
cloud: kubernetes
accelerators: H100:1
num_nodes: 2
use_spot: false
disk_size: 200
job_name: llama3-2-1b-k8s-2nodes
hf_token: ${HF_TOKEN}
For multi-node jobs, AutoModel switches the generated command to a distributed torchrun launch that uses SkyPilot’s node metadata:
torchrun \
--nproc_per_node=1 \
--nnodes=$SKYPILOT_NUM_NODES \
--node_rank=$SKYPILOT_NODE_RANK \
--rdzv_backend=c10d \
--master_addr=$(echo $SKYPILOT_NODE_IPS | head -n1) \
--master_port=12375 \
~/sky_workdir/nemo_automodel/recipes/llm/train_ft.py \
-c /tmp/automodel_job_config.yaml
That means you do not need to hand-build rendezvous arguments yourself.
Use these commands while the job is starting:
sky status
sky logs llama3-2-1b-k8s-2nodes
kubectl get pods -o wide
What you want to see:
two SkyPilot-managed worker pods
both pods scheduled onto GPU nodes
logs that include
--nnodes=$SKYPILOT_NUM_NODES
Step 4: Clean up#
When the run is finished, tear the cluster down so it stops consuming resources:
sky down llama3-2-1b-k8s
sky down llama3-2-1b-k8s-2nodes
You can remove old local launcher artifacts too:
rm -rf skypilot_jobs
Common first-run issues#
sky check kubernetes fails#
Usually this means SkyPilot cannot use your current kubeconfig context yet. Re-check the context with kubectl config current-context, then compare it with SkyPilot’s Kubernetes setup guide.
sky show-gpus --infra k8s shows no GPUs#
SkyPilot can only schedule GPUs that Kubernetes exposes. Make sure the GPU device plugin or operator is installed and the GPU nodes are healthy.
The job starts, but model download fails#
For gated models, make sure HF_TOKEN is exported in the shell that runs automodel. The SkyPilot launcher forwards it to the remote job.
Multi-node launch stalls during rendezvous#
Start with the single-node example first. If that works, check that:
your cluster has enough free GPU nodes for
num_nodesworker pods can talk to each other over the cluster network
the logs include the generated
torchrunmulti-node arguments shown above
Which file should I edit?#
If you want to adapt this tutorial for your own model, the quickest path is:
Copy
examples/llm_finetune/llama3_2/llama3_2_1b_squad_skypilot_kubernetes.yaml.Change the
modeland dataset sections.Keep the
skypilot:block small until the first run succeeds.
That way, when something goes wrong, you only have a few knobs to inspect.