Model Configuration#

Metrics that use an LLM — such as LLM-as-a-Judge, RAG, and Agentic metrics — require model configuration for their model, judge_model, or embeddings_model fields.

You can specify models in two ways: inline or by reference.

Inline Model#

Define the model endpoint directly in the metric definition:

"model": {
    "url": "<nim-endpoint-url>/v1",
    "name": "meta/llama-3.1-70b-instruct",
    "format": "nim",
    "api_key_secret": "my-secret",  # optional
}

Field

Required

Description

url

Yes

The base URL of the inference endpoint.

name

Yes

The model name to send in inference requests.

format

No

The API format ("nim", "openai", "llama_stack"). Defaults to "nim".

api_key_secret

No

Name of a secret containing the API key. Must be in the same workspace.

Use inline models when you want explicit control over the endpoint URL and model name, or when connecting to external APIs.

Model Reference#

Reference a model entity that has been registered with the NeMo Platform Models API:

"model": "my-workspace/my-judge-model"

A model reference is a string in the format workspace/model_name that points to an existing model entity. When you use a model reference, the evaluator:

  1. Validates the model entity exists through the Models API.

  2. Builds the Inference Gateway route URL for the model.

  3. Routes all inference requests through the Inference Gateway.

Model references are useful when you have models registered as model entities and want to:

  • Reuse the same model across multiple metrics without repeating endpoint details.

  • Route inference through the Inference Gateway for centralized model management.

  • Avoid embedding endpoint URLs and credentials directly in metric definitions.

Supported Fields#

Both inline models and model references are supported for the following fields:

Field

Used By

model

LLM-as-a-Judge metrics, online benchmark jobs

judge_model

RAG and Agentic metrics

embeddings_model

RAG metrics that require embeddings (for example, Response Relevancy)

Examples#

LLM-as-a-Judge with Inline Model#

result = client.evaluation.metrics.evaluate(
    metric={
        "type": "llm-judge",
        "model": {
            "url": "https://integrate.api.nvidia.com/v1",
            "name": "meta/llama-3.1-70b-instruct",
            "format": "nim",
            "api_key_secret": "nvidia-api-key",
        },
        "scores": [...],
        "prompt_template": {...},
    },
    dataset={...},
)

LLM-as-a-Judge with Model Reference#

result = client.evaluation.metrics.evaluate(
    metric={
        "type": "llm-judge",
        "model": "my-workspace/my-judge-model",
        "scores": [...],
        "prompt_template": {...},
    },
    dataset={...},
)

RAGAS Metric with Model Reference#

result = client.evaluation.metrics.evaluate(
    metric={
        "type": "topic_adherence",
        "judge_model": "my-workspace/my-judge-model",
    },
    dataset={...},
)