Deploying NVIDIA NIM with Red Hat OpenShift AI#

Note

This section requires the deployment of Red Hat OpenShift AI. Refer to the official documentation for steps on installing OpenShift AI .

Red Hat OpenShift AI (RHOAI) is a flexible, scalable MLOps platform built on top of Red Hat OpenShift that provides a standardized environment for developing, training, and deploying artificial intelligence models. It integrates a set of open-source and NVIDIA technologies into a single, managed lifecycle. While the NIM Operator handles the optimization of NVIDIA microservices, OpenShift AI acts as the orchestration layer that organizes these services into functional data science projects.

By using Red Hat OpenShift AI with NVIDIA, organizations can move from managing disparate GPU resources to a structured MLOps workflow. Deploying your NVIDIA NIMs through the OpenShift AI interface offers several strategic advantages over a standard Kubernetes deployment:

  • Unified Control Plane: Manage your data connections, Jupyter notebooks, and NIM inference endpoints from a single, cohesive graphical user interface (GUI).

  • Serverless Efficiency: Utilize the built-in Knative integration to “scale-to-zero,” ensuring that expensive GPU resources are only consumed when active requests are hitting your NIM endpoints.

  • Production Guardrails: Easily implement Service Mesh (Istio) for secure, encrypted communication between your applications and your AI models without writing complex network policies.

Hardware Profiles: Standardize how data scientists access NVIDIA GPUs, ensuring that NIMs are always scheduled on nodes with the correct CUDA version and memory capacity. NVIDIA NIM microservices are deployed with Kserve as InferenceServices within RHOAI, utilizing the NVIDIA GPU Operator for hardware acceleration.

Refer to the official documentation that details the NIM integration in Red Hat OpenShift AI.

Experiment with NVIDIA NIM in the Playground in the GenAI Studio in Red Hat OpenShift AI#

Note

This section requires the deployment of Red Hat OpenShift AI. Refer to the official documentation for steps on installing the Red Hat OpenShift AI Operator.

Note

This feature is currently available in Red Hat OpenShift AI 3.2 as a Technology Preview feature

Refer to the official documentation that details the NIM integration in Red Hat OpenShift AI.

Once you have deployed an NVIDIA NIM through Red Hat OpenShift AI, you can validate and interact with your model using the Gen AI Playground. This integrated sandbox environment allows you to test model responses, adjust inference parameters, and experiment with different system prompts without writing any code. By providing a chat-based interface directly within the OpenShift AI dashboard, the playground simplifies the “inner loop” of AI development, helping you ensure that your NIM is correctly configured and responding as expected before it is integrated into a downstream application.

Enable the Playground feature in Gen AI studio#

If needed, this is a one-time setup performed by an administrator to activate the Gen AI Studio interface and the underlying Llama Stack required for the playground in an OpenShift AI deployment. RHOAI must be installed with a deployed DataScienceCluster.

Refer to the documented prerequisites to enable this feature and follow these steps to verify.

Update Dashboard Configuration#

You must update the OdhDashboardConfig to reveal the Gen AI Studio and AI asset endpoints in the RHOAI dashboard.

  • Via CLI:

    oc patch odhdashboardconfig odh-dashboard-config -n redhat-ods-applications --type=merge -p '{"spec":{"dashboardConfig":{"genAiStudio":true,"modelAsService":true,"enablement":true}}}'
    
  • Via Web Console:

    1. Navigate to Operators > Installed Operators > Red Hat OpenShift AI.

    2. Open the OdhDashboardConfig tab and edit odh-dashboard-config.

    3. Under spec.dashboardConfig, set genAiStudio: true, modelAsService: true, and enablement: true.

Enable Llama Stack Operator#

The Gen AI Playground utilizes a unified API layer based on Llama Stack. This must be enabled within the DataScienceCluster resource.

  • Via CLI:

    oc patch datasciencecluster default-dsc --type=merge -p '{"spec":{"components":{"llamastackoperator":{"managementState":"Managed"}}}}'
    
  • Verification: Ensure the Llama Stack operator pod is running in the redhat-ods-applications namespace:

    oc get pods -n redhat-ods-applications -l app.kubernetes.io/name=llama-stack-operator
    

Labeling Models as Gen AI Assets#

Models will only appear in the Gen AI Studio model list and Playground if they carry the specific discovery label. Include the following labels in the metadata section to ensure immediate registration as an AI asset.

Example YAML Snippet:

1metadata:
2  name: nim-llama3-8b
3  labels:
4    opendatahub.io/dashboard: "true"
5    opendatahub.io/genai-asset: "true"

Labeling Existing Models#

If a model has already been deployed via the UI without these labels, apply them using the CLI:

oc label inferenceservice <model-name> -n <project-namespace> opendatahub.io/genai-asset=true --overwrite

Troubleshooting#

The Gen AI Playground identifies models by their InferenceService name. If the NIM backend serves the model under a different internal ID (e.g., nvidia/nemotron-3-nano), the Playground may return a “404 Not Found” error.

  1. Update ServingRuntime: Add the NIM_SERVED_MODEL_NAME environment variable to your NIM ServingRuntime so it matches the InferenceService name.

    oc patch servingruntime <runtime-name> -n <project> --type=json -p '[{"op":"add","path":"/spec/containers/0/env/-","value":{"name":"NIM_SERVED_MODEL_NAME","value":"<runtime-name>"}}]'
    
  2. Restart Predictor: Restart the NIM deployment to pick up the new environment variable.

    oc rollout restart deployment/<inferenceservice-name>-predictor -n <project>
    

Starting the Playground and Prompting NIM#

Once you have enabled Gen AI Studio and properly labeled your models, you can use the built-in Playground to verify the deployment and experiment with model behavior.

Access the Playground#

The playground is a web-based chat interface integrated directly into the OpenShift AI Dashboard, providing a “private ChatGPT-like” experience for your own hosted models.

  1. Open the Red Hat OpenShift AI Dashboard.

  2. From the side navigation menu, select Gen AI StudioPlayground.

  3. Click Create playground

    1. The Playground image needs to be downloaded the first time the Playground is opened so it may take longer than usual.

_images/red-hat-ai-factory-02.png

Configure the Chat Session#

Before prompting, you must select your specific NIM-backed model from the catalog of registered AI assets.

  • Select Project: Choose the Data Science Project containing your NIM deployment from the dropdown list.

  • Select Model: Choose your model from the Model list.

    Note

    If the model is greyed out, verify that both opendatahub.io/genai-asset and opendatahub.io/dashboard labels are applied to the InferenceService.

  • Adjust Parameters: Fine-tune the generation settings as needed:

    • Temperature: Set between 0.1–0.3 for factual/deterministic responses, or higher (0.7+) for more creative outputs.

    • Streaming: Enable this to view the model’s response in real-time as it is generated.

You can now prompt the model and experiment using the other features of the Playground.

Troubleshooting: If you don’t receive a response or get a 404 Not Found, ensure you followed the steps to set the previous NIM_SERVED_MODEL_NAME environment variable to match your InferenceService name.