Bring Your Own Metric
NeMo Platform offers built-in metrics that can be configured to evaluate on your custom data. Remote metrics let you bring your own metric logic into the NeMo Platform evaluation workflow by serving that logic from a REST API.
A remote metric gives you control over the evaluation logic, request payload, and reported scores while the Evaluator plugin SDK handles dataset iteration, result aggregation, retries, and job execution.
Overview
Remote metrics support two types:
NeMo Evaluator supports two execution modes through the Evaluator plugin SDK:
Prerequisites
Before running remote metric evaluations:
- Workspace: Have a workspace created.
- Remote endpoint: Have your evaluation endpoint running and accessible.
- API key (if required): If your endpoint requires authentication, create a secret to store the API key.
- Initialize the SDK:
Local Execution
Local execution provides immediate results for rapid iteration when developing and testing your metrics.
Generic Remote Metric
Use a generic remote metric when you need full control over the request payload and score extraction:
Key configuration:
body: Jinja template for the request payload. Use{{ item.<column> }}to access dataset columns.scores: List of score definitions with aparserobject containing JSONPath expression for extracting values from the response.
NeMo Agent Toolkit Remote Metric
Use the NAT remote metric type when integrating with NeMo Agent Toolkit evaluator endpoints:
The NAT metric automatically:
- Sends payload:
{"evaluator_name": "<name>", "item": <row_data>}. - Extracts the score from:
$.result.score.
Durable Remote Jobs
For production workloads, submit the same metric and dataset as a durable platform job. The returned job resource can wait for completion and download the final EvaluationResult.
Generic Remote Metric
NAT Remote Metric
Using API Key Secrets
If your remote endpoint requires authentication, store the API key as a platform secret and reference it from your metric:
For local run versus remote submit behavior of api_key_secret, see Model API Authentication.
The API key is sent in the Authorization: Bearer <key> header. For local execution, the SDK resolves the key according to the local api_key_secret behavior. For durable remote jobs, the job runtime receives the secret securely.
Endpoint Requirements
Your remote endpoint must:
- Accept
POSTrequests withContent-Type: application/json. - Return a JSON response containing the score values.
Example Endpoint (FastAPI)
NAT Endpoint Format
NAT endpoints receive:
And must return:
Configuration Options
Metric Parameters
Score Configuration (Generic Remote Only)
Each RemoteScore supports:
Parser configuration:
Example with all fields:
Limitations
-
Network access: For job-based evaluation, remote metric endpoints must be reachable from the local platform runtime. Use a host or service URL that the platform can access.
-
Response format: Scores must be extractable using JSONPath from the response. Ensure your endpoint returns properly structured JSON.
-
Live evaluation limits: Live evaluations are limited to 10 rows. Use job-based evaluation for larger datasets.
- Evaluation Results - Understanding and downloading results
- LLM-as-a-Judge - Use an LLM to evaluate outputs
- Agentic Evaluation - Evaluate agent workflows