Configure Resource Quotas#
It’s recommended that you create Kubernetes resource quotas for the namespace where the NVIDIA NIM Operator is deployed as well as any namespace where you deploy NIM Operator managed microservices.
The following is a list of default namespaces where you should consider adding resource quotas. Your cluster may use different namespaces.
nim-operator
: The namespace where the NIM Operator is deployed.nim-service
: The namespace where NIM microservices are deployed.nemo
: The namespace where NeMo microservices are deployed.
In addition to using resource quotas, you should make sure your cluster has enough available resources for the models and microservices your are deploying.
Resource quotas can help manage and priortize resource consumption on your cluster, however, if a cluster’s available resources are much smaller than required, you may still experience pod evictions even with resource quotas applied.
Refer to the Platform Support page for details on resource requirements.
Important
Resource quotas are required when using the NIM Operator on GKE clusters.
Create a Namespace Quota#
The sample below configures a resource quota in the NIM Operator namespace, nim-operator
.
Update the namespace name to the namespace where you are applying the resource quota.
Create a manifest for the resource quota like the
resource-quota.yaml
file below.apiVersion: v1 kind: ResourceQuota metadata: name: nim-operator-quota spec: hard: pods: 100 scopeSelector: matchExpressions: - operator: In scopeName: PriorityClass values: - system-node-critical - system-cluster-critical
Note
This sample manifest uses PriortityClass to manage resource useage.
Refer to the Kubernetes resource quota documentation for more details on setting resource limits in your cluster.
Apply the manifest:
$ kubectl apply -f resource-quota.yaml -n nim-operator
Optional: View the resource quota.
$ kubectl describe -n nim-operator resourcequota
Example Output
Name: nim-operator-quota Namespace: nim-operator Resource Used Hard -------- ---- ---- configmaps 1 50 limits.cpu 1 40 limits.memory 256Mi 128Gi persistentvolumeclaims 0 10 pods 1 100 requests.cpu 500m 20 requests.memory 128Mi 64Gi secrets 1 50 services 1 20