Bring-Your-Own-Endpoint#

Deploy and manage model serving yourself, then point NeMo Evaluator to your endpoint. This approach gives you full control over deployment infrastructure while still leveraging NeMo Evaluator’s evaluation capabilities.

Overview#

With bring-your-own-endpoint, you:

Handle model deployment and serving independently
Provide an OpenAI-compatible API endpoint
Use either the launcher or core library for evaluations
Maintain full control over infrastructure and scaling

When to Use This Approach#

Choose bring-your-own-endpoint when you:

Have existing model serving infrastructure
Need custom deployment configurations
Want to deploy once and run many evaluations
Have specific security or compliance requirements
Use enterprise Kubernetes or MLOps pipelines

Quick Examples#

Using Launcher with Existing Endpoint#

# Point launcher to your deployed model
nemo-evaluator-launcher run \
    --config packages/nemo-evaluator-launcher/examples/local_llama_3_1_8b_instruct.yaml \
    -o target.api_endpoint.url=http://your-endpoint:8080/v1/completions \
    -o target.api_endpoint.model_id=your-model-name \
    -o deployment.type=none  # No launcher deployment

Using Core Library#

from nemo_evaluator import (
    ApiEndpoint, EvaluationConfig, EvaluationTarget, evaluate
)

# Configure your endpoint
api_endpoint = ApiEndpoint(
    url="http://your-endpoint:8080/v1/completions",
    model_id="your-model-name"
)
target = EvaluationTarget(api_endpoint=api_endpoint)

# Run evaluation
config = EvaluationConfig(type="gsm8k", output_dir="results")
results = evaluate(eval_cfg=config, target_cfg=target)

Endpoint Requirements#

Your endpoint must provide OpenAI-compatible APIs:

Required Endpoints#

Completions: /v1/completions (POST) - For text completion tasks
Chat Completions: /v1/chat/completions (POST) - For conversational tasks
Health Check: /v1/health (GET) - For monitoring (recommended)

Request/Response Format#

Must follow OpenAI API specifications for compatibility with evaluation frameworks. See the Testing Endpoint Compatibility guide to verify your endpoint’s OpenAI compatibility.

Configuration Management#

Basic Configuration#

# config/bring_your_own.yaml
deployment:
  type: none  # No launcher deployment

target:
  api_endpoint:
    url: http://your-endpoint:8080/v1/completions
    model_id: your-model-name
    api_key: ${API_KEY}  # Optional

evaluation:
  tasks:
    - name: mmlu
    - name: gsm8k

Key Benefits#

Infrastructure Control#

Custom configurations: Tailor deployment to your specific needs
Resource optimization: Optimize for your hardware and workloads
Security compliance: Meet your organization’s security requirements
Cost management: Control costs through efficient resource usage

Operational Flexibility#

Deploy once, evaluate many: Reuse deployments across multiple evaluations
Integration ready: Works with existing infrastructure and workflows
Technology choice: Use any serving framework or cloud provider
Scaling control: Scale according to your requirements

Getting Started#

Choose your approach: Select from manual deployment, hosted services, or enterprise integration
Deploy your model: Set up your OpenAI-compatible endpoint
Configure NeMo Evaluator: Point to your endpoint with proper configuration
Run evaluations: Use launcher or core library to run benchmarks
Monitor and optimize: Track performance and optimize as needed