Managing NIM Services as a NIM Pipeline#

About NIM Pipelines#

As an alternative to managing NIM services individually using multiple NIMService custom resources, you can manage multiple NIM services using one NIMPipeline custom resource.

The following sample manifest deploys NIM for LLMs only.

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMPipeline
metadata:
  name: pipeline-all
spec:
  services:
    - name: meta-llama3-8b-instruct
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/meta/llama-3.1-8b-instruct
          tag: 1.3.3
          pullPolicy: IfNotPresent
          pullSecrets:
          - ngc-secret
        authSecret: ngc-api-secret
        storage:
          nimCache:
            name: meta-llama3-8b-instruct
            profile: ''
        replicas: 1
        resources:
          limits:
            nvidia.com/gpu: 1
        expose:
          service:
            type: ClusterIP
            port: 8000

Refer to the following table for information about the commonly modified fields:

Field	Description	Default Value
`spec.services.enabled`	When set to `true`, the Operator deploys the NIM service.	`false`
`spec.services.name`	Specifies a name for the NIM service.	None
`spec.services.spec`	Specifies a `NIMService` custom resource that represents the NIM microservice.	None

Prerequisites#

A NIM cache for each NIM microservice or a PVC that you can specify in the spec.storage.pvc field of the NIM service specification.

Procedure#

Create a file, such as pipeline-all.yaml, with contents like the following example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMPipeline
metadata:
  name: pipeline-all
spec:
  services:
    - name: meta-llama3-8b-instruct
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/meta/llama-3.1-8b-instruct
          tag: 1.3.3
          pullPolicy: IfNotPresent
          pullSecrets:
          - ngc-secret
        authSecret: ngc-api-secret
        storage:
          nimCache:
            name: meta-llama3-8b-instruct
            profile: ''
        replicas: 1
        resources:
          limits:
            nvidia.com/gpu: 1
        expose:
          service:
            type: ClusterIP
            port: 8000
    - name: nv-embedqa-1b-v2
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2
          tag: 1.3.1
          pullPolicy: IfNotPresent
          pullSecrets:
          - ngc-secret
        authSecret: ngc-api-secret
        storage:
          nimCache:
            name: nv-embedqa-1b-v2
            profile: ''
        replicas: 1
        resources:
          limits:
            nvidia.com/gpu: 1
        expose:
          service:
            type: ClusterIP
            port: 8000
    - name: nv-rerankqa-1b-v2
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2
          tag: 1.3.1
          pullPolicy: IfNotPresent
          pullSecrets:
          - ngc-secret
        authSecret: ngc-api-secret
        storage:
          nimCache:
            name: nv-rerankqa-1b-v2
            profile: ''
        replicas: 1
        resources:
          limits:
            nvidia.com/gpu: 1
        expose:
          service:
            type: ClusterIP
            port: 8000

Apply the manifest:

$ kubectl apply -n nim-service -f pipeline-all.yaml

Optional: View information about the pipeline:

$ kubectl describe nimpipelines.apps.nvidia.com -n nim-service

Refer to Verification to confirm the NIM for LLMs microservice is available.

Deleting NIM Pipelines#

To delete a pipeline and remove the resources and objects associated with the services, perform the following steps:

View the pipeline custom resources:

$ kubectl get nimpipelines.apps.nvidia.com -A

Example Output

NAMESPACE     NAME           STATUS
nim-service   pipeline-all   deployed

Delete the custom resource:

$ kubectl delete nimpipelines.apps.nvidia.com -n nim-service pipeline-all

Next Steps#

Deploy applications to use the NIM services, such as the Sample RAG Application.