NVIDIA Network Operator v26.4.0

Troubleshooting

The l8k sosreport command collects diagnostic data from the cluster, including pod logs, CRD statuses, OFED diagnostics, and node information:

Copy
Copied!
            

l8k sosreport --kubeconfig ~/.kube/config

The output is saved to a directory (default: ./sosreport) that can be shared for offline analysis.

For the broader Network Operator sosreport workflow (parsing, web UI, what to look for), see Troubleshooting — SOS Report.

The k8s-launch-kit-troubleshoot skill (see AI Skills) can analyze sosreport data when invoked from any AI agent (Claude Code, Cursor, Codex CLI, or other agents that load Markdown context). Collect a sosreport and then ask the agent to investigate issues such as OFED driver crashes, SR-IOV VF allocation failures, pods stuck in ContainerCreating, or NIC configuration errors.

Symptom

Likely Cause

Where to look

l8k discover exits with code 3 API server unreachable or RBAC missing kubectl auth can-i and the kubeconfig
Discovery completes with empty clusterConfig Default --node-selector excludes all nodes Pass --node-selector matching a label on your nodes (see Discover Workflow)
Generation fails with “RA2.1 requires –network-operator-release in [26.1]” Spectrum-X version and Network Operator release mismatch Set --network-operator-release to match the Spectrum-X version (see Spectrum-X)
l8k generate --deploy exits with code 4 Apply failed; an earlier resource is not Ready Inspect kubectl get nicclusterpolicy and kubectl get nicnodepolicy; collect a sosreport
OFED driver pods CrashLoopBackOff after deploy Storage or third-party RDMA modules block driver reload Verify unloadStorageModules / unloadThirdPartyRDMAModules settings in your config (see Discover Workflow)
SR-IOV pods stuck in ContainerCreating VF allocation failure or device plugin not ready kubectl describe pod and SR-IOV operator logs
Previous Automation and CI/CD
Next Reference
© Copyright 2025-2026, NVIDIA. Last updated on Jun 14, 2026