Troubleshooting NeMo Microservices Deployment on Kubernetes#

Use this documentation to troubleshoot issues that can arise while you deploy and run the NeMo microservices on Kubernetes.

General Debugging#

If you need to inspect errors in your cluster, run kubectl events to list the most important changes in the cluster namespace. You can further debug individual pods by following Debug running pods in the Kubernetes documentation.

Authentication Issues#

Cluster permission issues#

If kubectl commands fail due to permission issues, you might see the following error:

$ kubectl get pods
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:default:default" cannot list resource "pods" in API group "" in the namespace "default"

This error occurs when the service account doesn’t have the necessary permissions to access the resource.

To resolve this, you can either:

  1. Add the necessary permissions to the service account.

  2. Use the kubectl command with the --as flag to use a different service account.