Troubleshooting
The l8k sosreport command collects diagnostic data from the cluster, including pod logs, CRD statuses, OFED diagnostics, and node information:
l8k sosreport --kubeconfig ~/.kube/config
The output is saved to a directory (default: ./sosreport) that can be shared for offline analysis.
For the broader Network Operator sosreport workflow (parsing, web UI, what to look for), see Troubleshooting — SOS Report.
The k8s-launch-kit-troubleshoot skill (see AI Skills) can analyze sosreport data when invoked from any AI agent (Claude Code, Cursor, Codex CLI, or other agents that load Markdown context). Collect a sosreport and then ask the agent to investigate issues such as OFED driver crashes, SR-IOV VF allocation failures, pods stuck in ContainerCreating, or NIC configuration errors.
Symptom | Likely Cause | Where to look |
|---|---|---|
l8k discover exits with code 3 |
API server unreachable or RBAC missing | kubectl auth can-i and the kubeconfig |
Discovery completes with empty clusterConfig |
Default --node-selector excludes all nodes |
Pass --node-selector matching a label on your nodes (see Discover Workflow) |
| Generation fails with “RA2.1 requires –network-operator-release in [26.1]” | Spectrum-X version and Network Operator release mismatch | Set --network-operator-release to match the Spectrum-X version (see Spectrum-X) |
l8k generate --deploy exits with code 4 |
Apply failed; an earlier resource is not Ready | Inspect kubectl get nicclusterpolicy and kubectl get nicnodepolicy; collect a sosreport |
| OFED driver pods CrashLoopBackOff after deploy | Storage or third-party RDMA modules block driver reload | Verify unloadStorageModules / unloadThirdPartyRDMAModules settings in your config (see Discover Workflow) |
SR-IOV pods stuck in ContainerCreating |
VF allocation failure or device plugin not ready | kubectl describe pod and SR-IOV operator logs |
Troubleshooting — SOS Report — the upstream sosreport workflow for the Network Operator
AI Skills — the
k8s-launch-kit-troubleshootskillAutomation and CI/CD — exit codes and structured errors