Deploy NVIDIA NIM#
By using the NeMo Deployment Management microservice, you can deploy NVIDIA NIM microservices to your Kubernetes cluster. You can also configure to deploy NVIDIA NIM microservices for external endpoints from model endpoint providers such as OpenAI ChatGPT and build.nvidia.com within your cluster.
Prerequisites#
Your cluster administrator has installed the NeMo Deployment Management microservice on your Kubernetes cluster following the installation guide at NeMo Deployment Management Setup Guide.
You have stored the NeMo Deployment Management host base URL in the
DEPLOYMENT_MANAGEMENT_SERVICE_URL
environment variable. If you have installed the NeMo platform, this is the same as the platform host base URL.For Helm chart installation, see Deployment Management Setup.
Deployment Methods#
You can deploy a NIM in the following ways:
Direct Deployment Using the v1/deployment/model-deployments
API#
To deploy a NIM, submit a POST
request to the v1/deployment/model-deployments
API as shown in the following example.
Example: Deploy Meta LLaMa3-8b-instruct NIM from NGC
Deploy Meta LLaMa3-8b-instruct NIM from NGC
curl --location "${DEPLOYMENT_MANAGEMENT_SERVICE_URL}/v1/deployment/model-deployments" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "llama-3.1-8b-instruct",
"namespace": "meta",
"config": {
"model": "meta/llama-3.1-8b-instruct",
"nim_deployment": {
"image_name": "nvcr.io/nim/meta/llama-3.1-8b-instruct",
"image_tag": "1.8",
"pvc_size": "25Gi",
"gpu": 1,
"additional_envs": {
"NIM_GUIDED_DECODING_BACKEND": "fast_outlines"
}
}
}
}'
After deployment, you can access the model under the name specified in config.model
(for example, meta/llama3.1-8b-instruct
).
Deployment with Pre-defined Configurations#
You can create manage NIM deployment configurations separately by using the v1/deployment/configs
API. You can specify the configurations in the request body of the v1/deployment/model-deployments
API.
This method is useful for:
Deploying the same NIM with different configurations.
Deploying multiple NIMs with the same configuration.
The following procedure shows how to set up a pre-defined configuration and deploy it.
Create the deployment configuration.
curl --location "${DEPLOYMENT_MANAGEMENT_SERVICE_URL}/v1/deployment/configs" \ --header 'Content-Type: application/json' \ --data '{ "name": "llama-3.1-8b-instruct", "namespace": "meta", "config": { "model": "meta/llama-3.1-8b-instruct", "nim_deployment": { "image_name": "nvcr.io/nim/meta/llama-3.1-8b-instruct", "image_tag": "1.8", "pvc_size": "25Gi", "gpu": 1, "additional_envs": { "NIM_GUIDED_DECODING_BACKEND": "fast_outlines" } } } }'
curl -X POST "${DEPLOYMENT_MANAGEMENT_SERVICE_URL}/v1/deployment/configs" \ --header 'Content-Type: application/json' \ --data '{ "name": "chatgpt", "namespace": "openai", "external_endpoint":{ "host_url": "https://api.openai.com", "api_key": "${OPENAI_API_KEY}", "enabled_models" : ["gpt-3.5-turbo"] } }'
curl -X PUT "${DEPLOYMENT_MANAGEMENT_SERVICE_URL}/v1/deployment/configs" \ --header 'Content-Type: application/json' \ --data '{ "name": "integrate", "namespace": "nvidia", "external_endpoint":{ "host_url": "https://integrate.api.nvidia.com", "api_key": "${NVIDIA_INTEGRATE_API_KEY}", "enabled_models" : ["meta/llama-3.1-405b-instruct"] } }'
Deploy. After creating a configuration, deploy the NIM by referencing the config name:
curl --location "${DEPLOYMENT_MANAGEMENT_SERVICE_URL}/v1/deployment/model-deployments" \ --header 'Content-Type: application/json' \ --data '{ "name": "llama-3.1-8b-instruct", "namespace": "meta", "config": "meta/llama-3.1-8b-instruct" }'