Elastic NIM#

NVIDIA Inference Microservices (NIM) are pre-built containers that enable secure, high-performance AI model inferencing. This guide shows you how to deploy NIMs on NVIDIA Cloud Functions (NVCF), a fully managed service that simplifies NIM deployment and management.

Benefits of Running NIM on NVCF#

  • High Performance: Optimized for NVIDIA GPUs with automatic resource allocation and load balancing

  • Simplified Operations: Streamlined NIM function creation and one-click deployment - no need to manage Kubernetes clusters or configure scaling policies

  • Hardware Awareness: Automatically selects the best hardware resources based on workload requirements.

  • Cost Optimization: Built-in auto-scaling (including scale-to-zero) reduces infrastructure costs

  • Enterprise-Ready: Secure deployment with automatic updates, monitoring, and enterprise support. This guide uses the Llama 3 8B NIM as an example, but the instructions apply to any NIM available through NGC.

Prerequisites#

Before deploying a NIM through NVCF, ensure your Kubernetes environment meets both the software and hardware requirements:

System Requirements

Software Requirements

For detailed version compatibility information, including supported models and NVIDIA software requirements, please refer to the NIM Support Matrix: Supported Models.

Note

All NIM container images referenced in the support matrix must be uploaded to your NGC private registry before they can be deployed through NVCF.

Hardware Requirements

Each NIM model has specific hardware requirements based on its size and complexity:

  • Smaller models (like 8B parameter models): Generally require at least one NVIDIA GPU with sufficient VRAM

  • Larger models (like 70B parameter models): May require multiple GPUs or higher-end GPUs with more VRAM

Important

Always check the specific hardware requirements for your chosen NIM in the NIM Support Matrix. Requirements vary based on model size, batch size, and desired inference performance.

Network Requirements

  • Outbound internet access to: - NGC container registry (nvcr.io) for pulling NIM containers - NVIDIA Cloud APIs for function management and monitoring - Your organization’s private registry for storing copied NIM containers

  • Stable network connection with sufficient bandwidth for container operations

  • Any proxy or firewall must allow HTTPS (port 443) access to these services

Kubernetes Requirements

  • A Kubernetes cluster configured with GPU-enabled nodes

  • NVIDIA device plugin installed and configured

  • Container runtime that supports NVIDIA GPUs (e.g., containerd with NVIDIA runtime)

Note

Refer to your specific Kubernetes distribution’s documentation for detailed instructions on configuring pods with GPU access:

  • For vanilla Kubernetes: kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus

  • For other distributions: Check their respective documentation for GPU support

Generate Personal API Key

A Personal API Key is required to authenticate with NVCF and invoke the function. To generate a Personal API Key:

  1. Click “Generate Personal API Key” in the NVCF deployments interface

  2. Fill in the key details:

    • Key Name: Choose a descriptive name

    • Expiration: Select desired expiration period (e.g., 12 months)

    • Services: Enable the following services:

      • Cloud Functions

      • Private Registry

Personal API Key Generation Dialog

Personal API Key generation dialog showing required services#

  1. Set the generated API key as an environment variable:

    export NGC_API_KEY="your-api-key"
    

    Replace your-api-key with the actual API key generated in step 2.

Register a Kubernetes Cluster with NVCF

Your Kubernetes cluster must have GPU nodes available for NIM deployments. For detailed setup instructions, see Cluster Setup and Management. To register your cluster with NVCF:

  1. Navigate to Settings in the NVCF interface

  2. Click “+Register Cluster”

  3. Fill in the configuration details:

  • Cluster Name: A unique name for your cluster

  • Cluster Group: Select or create a group for organization

  • Compute Platform: Your cloud provider or on-premises platform

  • Region: The geographic region of your cluster

  • Description: Optional description of the cluster’s purpose

  • Attributes: Any additional cluster attributes

NVCF Cluster Registration Form

NVCF cluster registration form showing required configuration fields#

Install the NVIDIA Cluster Agent Operator by running the helm command provided in the UI. This command can also be found later in the Settings page:

Example format of the helm command (use the actual command generated by the UI):

helm upgrade nvca-operator \
  -n nvca-operator \
  --create-namespace \
  -i \
  --reset-values \
  --wait \
  "https://helm.ngc.nvidia.com/nvidia/nvcf-byoc/charts/nvca-operator-[OPERATOR_VERSION].tgz" \
  --username='\$oauthtoken' \
  --password="[REDACTED_API_KEY]" \
  --set ngcConfig.serviceKey="[REDACTED_API_KEY]" \
  --set ncaID="[REDACTED_NCA_ID]" \
  --set clusterID="[REDACTED_CLUSTER_ID]"

Note

Please refer to the cluster-setup-management for additional prerequisites and step-by-step instructions.

Create and Deploy the Function#

This section guides you through deploying a NIM using NVCF. Here’s an overview of the process:

  1. Create a new function using the appropriate NIM container.

  2. Deploy a version of the function.

  3. Manage the function with the NGC CLI.

Note

Before you begin, please ensure you have completed all the requirements in the Register a Kubernetes Cluster with NVCF and Register NIM with Private Registry sections.

Follow these steps to quickly get started with your NIM deployment:

  1. Log in to your NVCF account and navigate to the Functions page.

    NVCF Functions Page

    NVCF interface showing the Functions page with available tabs#

  2. Click “Create Function” and choose “Elastic NIM” to start creating a new function.

    Create Function Button

    This image shows the button used to create a new function.#

  3. Fill in the function details, some fields will populate automatically:

    Elastic NIM Create Details

    This image shows the details for creating an Elastic NIM.#

    • NIM: Select NIM from the dropdown

    • Tag: Choose a tag

    • Model Configuration: A profile that matches available hardware

    • Prefix: (Optional) Prefix for the function name

    • Description: (Optional) Description

    • Metadata Tags: (Optional) Metadata tags for the function

Attention

Not all NIMs are currently onboarded to Elastic NIM. To manually deploy a downloadable NIM as a function, follow the Manual NIM Deployment guide.

  1. Review and Deploy a Version of the Function

    Deploy Function Version

    Configure deployment settings for your function version#

    • GPU Type: Select the appropriate GPU type for your workload (e.g., L40S, H100)

    • Min Instances: Minimum number of function instances to maintain, even when idle. Set this to 0 for scale-to-zero capability

    • Max Instances: Maximum number of function instances that can be created to handle increased load

    • Max Concurrency: Maximum number of simultaneous requests a single function instance can handle. Higher values improve throughput but require more memory

    The deployment process will begin, and NVCF will deploy the NIM container to the cluster.

    Note

    The initial deployment may take a few minutes. You can monitor the status in the NVCF UI or using the CLI.

Managing Your Function using the NGC CLI#

NVCF can be managed using the NGC CLI. These steps show how to install and configure the NGC CLI and use it to manage your functions.

Note

Before using the NGC CLI, ensure you have created a Personal API Key as described in the prerequisites section.

NGC CLI Installation and Configuration

Note

The NGC CLI is required when using NVCF command-line tools for operations like function validation and container image management.

The NGC CLI is supported on Linux, Windows, and macOS operating systems. For detailed system requirements and installation instructions, visit the NGC CLI Installation Guide.

The following installation steps are for Ubuntu Linux. For Windows, macOS, or other Linux distributions, refer to the NGC CLI Installation Guide for equivalent steps.

  1. Download and install NGC CLI:

    # Download NGC CLI
    wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.58.0/files/ngccli_linux.zip -O ngccli_linux.zip && unzip ngccli_linux.zip
    
  2. Verify the download integrity:

    # Check MD5 hash
    find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5
    
    # Check SHA256 hash
    sha256sum ngccli_linux.zip
    

    Compare the SHA256 output with: 1aa098196b26f66a1c7af3cbf5439b236acbbdcc041f2f0f6b8d2ceb28c6955f

  3. Make the NGC CLI executable and add to PATH:

    # Make executable
    chmod u+x ngc-cli/ngc
    
    # Add absolute path to NGC CLI to your PATH
    echo "export PATH=\"$PATH:\$PWD/ngc-cli\"" >> ~/.bashrc
    source ~/.bashrc
    
    # Verify NGC CLI is in path
    which ngc
    
  4. Configure NGC CLI with your NGC API Key:

    Important

    • The NGC API Key required here is different from the Personal API Key (which begins with nvapi)

    • Make sure to save your NGC API Key securely after generation, as it cannot be viewed again later

    • For detailed NGC CLI documentation and commands, refer to the NGC CLI Documentation

    # Configure NGC CLI
    ngc config set
    
  5. Verify NGC CLI cloud-function command:

    # List available functions to verify the command works
    ngc cloud-function function list
    

    If the command works, you’ll see a list of functions in your organization (which may be empty if no functions have been created yet).

Export NGC_API_KEY Environment Variable#

export NGC_API_KEY=<nvapi-...>

List All Functions#

ngc cloud-function function list

Get Function Information (Specify Version)#

ngc cloud-function function info <function-id>:<version-id>
Function ID and Version ID Screenshot

Delete a Function#

ngc cloud-function function remove <function-id>:<version-id>

Test the NIM Function with Sample Input#

Note

This example demonstrates how to use the OpenAI API through the NIM framework.

curl -X POST \
  https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/${FUNCTION_ID} \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -H "Authorization: Bearer ${NGC_API_KEY}" \
  -d '{
    "model": "meta/llama-3.1-8b-instruct",
    "messages": [
      {
        "role": "user",
        "content": "What is machine learning?"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 100,
    "stream": false
  }'

The response will include the model’s completion of your prompt, confirming that the function is working correctly.

For more information about NGC CLI commands, refer to the NGC CLI Documentation.

Deployment Best Practices#

  • Scaling Configuration Set appropriate minimum and maximum instance counts based on your workload Configure max concurrency based on your container’s capabilities Enable autoscaling to handle varying workloads efficiently Monitor function request queue depth for scaling decisions

Note

Before deploying, it’s recommended to run the Deployment Validator to catch common configuration issues.

Troubleshooting Guide#

Function Deployment Taking Too Long If the function deployment process is taking longer than expected:

  • Remember that NIMs need to download the model weights which can be several GB in size.

  • Verify that your NGC Personal API Key is valid and has the appropriate permissions

  • Check the event logs for any error messages or issues:

    # Check function status and events
    ngc cloud-function function info <function-id>:<version-id>
    

    function-id and version-id can be found in the NVCF UI or using the CLI.

    # Check pod events in the nvcf-backend namespace
    kubectl get events -n nvcf-backend
    
    # View pod logs
    kubectl logs <pod-name> -n nvcf-backend
    
    # Check node resources
    kubectl describe nodes | grep -A 5 "Allocated resources"
    
    # View total and available GPUs
    kubectl get nodes -o=custom-columns=NAME:.metadata.name,TOTAL_GPUS:.status.capacity.'nvidia\.com/gpu',AVAILABLE_GPUS:.status.allocatable.'nvidia\.com/gpu'
    
  • Ensure there are sufficient resources in your cluster:

    # Check node resources
    kubectl describe nodes | grep -A 5 "Allocated resources"
    
    # View total and available GPUs
    kubectl get nodes -o=custom-columns=NAME:.metadata.name,TOTAL_GPUS:.status.capacity.'nvidia\.com/gpu',AVAILABLE_GPUS:.status.allocatable.'nvidia\.com/gpu'
    

GPU Scheduling Issues If pods cannot be scheduled:

  • Verify that GPU-enabled pods can be scheduled in your cluster:

    # Check if pods are scheduled and running
    kubectl get pods -n nvcf-system
    
    # View detailed pod status and events
    kubectl describe pods -n nvcf-system
    
    # Check GPU device plugin pods
    kubectl get pods -n nvidia-gpu-operator
    
    # View GPU operator status
    kubectl get clusterpolicy -n nvidia-gpu-operator
    
    # Check if nodes have required GPU labels (should see nvidia.com/gpu.present=true)
    kubectl get nodes --show-labels
    
    # Check for taints that might prevent pod scheduling
    kubectl describe nodes | grep Taint
    
  • Check that GPUs are available and properly configured:

    # Check GPU devices on nodes (replace xxxx with actual pod name)
    kubectl exec -it -n gpu-operator nvidia-device-plugin-daemonset-xxxx -- nvidia-smi
    
    # Get the actual pod name
    kubectl get pods -n gpu-operator -l app=nvidia-device-plugin-daemonset
    
    # Verify GPU feature discovery is running
    kubectl get pods -n gpu-operator -l app=gpu-feature-discovery
    kubectl logs -n gpu-operator -l app=gpu-feature-discovery
    
    # Check GPU metrics from DCGM
    kubectl get pods -n gpu-operator -l app=nvidia-dcgm-exporter
    kubectl logs -n gpu-operator -l app=nvidia-dcgm-exporter
    
  • Ensure no taints exist that would block the scheduler:

    # List node taints
    kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
    

Related Resources

NVIDIA Cloud Functions

NVIDIA NIM Microservices