NeMo NIM Proxy Target#

The following target references the Llama 3.1 Nemotron Nano 8B V1 model from NVIDIA NeMo NIM Proxy in the same Kubernetes namespace. NIM Proxy proxies inference endpoints for NIM for LLMs that are deployed in the same cluster by NeMo Deployment Management.

By configuring a target for NIM Proxy, the proxy can load-balance connections from NeMo Auditor to multiple instances of NIM for LLMs. For information about scaling NIM for LLMs, refer to the installation tips for Deployment Management to enable metrics and horizontal pod autoscaling.

Refer to garak.generators.nim.NVOpenAIChat for the parameters to specify in the options.nim field. The options override the default values from the DEFAULT_PARAMS in the API reference.

Important

Export the NIM_API_KEY environment variable with your API key or any value when you start the microservice container. The environment variable must be set even if it is not used to access build.nvidia.com.

Set the AUDITOR_BASE_URL environment variable to the NeMo Auditor service endpoint. Refer to Accessing the Microservice for more information.

import os
from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(base_url=os.getenv("AUDITOR_BASE_URL"))

target = client.beta.audit.targets.create(
    namespace="default",
    name="demo-nemo-platform-target",
    type="nim.NVOpenAIChat",
    model="nvidia/llama-3.1-nemotron-nano-8b-v1",
    options={
        "nim": {
            "skip_seq_start": "<think>",
            "skip_seq_end": "</think>",
            "max_tokens": 3200,
            "uri": "http://nemo-nim-proxy:8000/v1/"
        }
    }
)

print(target.model_dump_json(indent=2))
curl -X POST "${AUDITOR_BASE_URL}/v1beta1/audit/targets" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace": "default",
    "name": "demo-nemo-platform-target",
    "type": "nim.NVOpenAIChat",
    "model": "nvidia/llama-3.1-nemotron-nano-8b-v1",
    "options": {
      "nim": {
          "skip_seq_start": "<think>",
          "skip_seq_end": "</think>",
          "max_tokens": 3200,
          "uri": "http://nemo-nim-proxy:8000/v1/"
      }
    }
  }' | jq

Example Output

{
  "model": "nvidia/llama-3.1-nemotron-nano-8b-v1",
  "type": "nim.NVOpenAIChat",
  "id": "audit_target-RVkwYHMoNQgfNsK7iR7bMv",
  "created_at": "2025-10-23T18:08:50.895699",
  "custom_fields": {},
  "description": null,
  "entity_id": "audit_target-RVkwYHMoNQgfNsK7iR7bMv",
  "name": "demo-nemo-platform-target",
  "namespace": "default",
  "options": {
    "nim": {
      "skip_seq_start": "<think>",
      "skip_seq_end": "</think>",
      "max_tokens": 3200,
      "uri": "http://nemo-nim-proxy:8000/v1/"
    }
  },
  "ownership": null,
  "project": null,
  "schema_version": "1.0",
  "type_prefix": null,
  "updated_at": "2025-10-23T18:08:50.895705"
}
{
  "schema_version": "1.0",
  "id": "audit_target-3cMByWwx3XL4329zsdEDUw",
  "description": null,
  "type_prefix": null,
  "namespace": "default",
  "project": null,
  "created_at": "2025-10-22T20:15:04.448066",
  "updated_at": "2025-10-22T20:15:04.448072",
  "custom_fields": {},
  "ownership": null,
  "name": "demo-nemo-platform-target",
  "entity_id": "audit_target-3cMByWwx3XL4329zsdEDUw",
  "type": "nim.NVOpenAIChat",
  "model": "nvidia/llama-3.1-nemotron-nano-8b-v1",
  "options": {
    "nim": {
      "skip_seq_start": "<think>",
      "skip_seq_end": "</think>",
      "max_tokens": 3200,
      "uri": "http://nemo-nim-proxy:8000/v1/"
    }
  }
}