Bring-Your-Own-Endpoint#

Deploy and manage model serving yourself, then point NeMo Evaluator to your endpoint. This approach gives you full control over deployment infrastructure while still leveraging NeMo Evaluator’s evaluation capabilities.

Overview#

With bring-your-own-endpoint, you:

Handle model deployment and serving independently
Provide an OpenAI-compatible API endpoint
Use either the launcher or core library for evaluations
Maintain full control over infrastructure and scaling

When to Use This Approach#

Choose bring-your-own-endpoint when you:

Have existing model serving infrastructure
Need custom deployment configurations
Want to deploy once and run many evaluations
Have specific security or compliance requirements
Use enterprise Kubernetes or MLOps pipelines

Deployment Approaches#

Choose the approach that best fits your infrastructure and requirements:

Manual Deployment

Deploy using vLLM, TensorRT-LLM, or custom serving frameworks for full control.

Manual Deployment

Hosted Services

Use NVIDIA Build, OpenAI API, or other cloud providers for instant availability.

Hosted Services

Quick Examples#

Using Launcher with Existing Endpoint#

# Point launcher to your deployed model
nemo-evaluator-launcher run \
    --config-dir examples \
    --config-name local_llama_3_1_8b_instruct \
    -o target.api_endpoint.url=http://your-endpoint:8080/v1/completions \
    -o target.api_endpoint.model_id=your-model-name \
    -o deployment.type=none  # No launcher deployment

Using Core Library#

from nemo_evaluator import (
    ApiEndpoint, EvaluationConfig, EvaluationTarget, evaluate
)

# Configure your endpoint
api_endpoint = ApiEndpoint(
    url="http://your-endpoint:8080/v1/completions",
    model_id="your-model-name"
)
target = EvaluationTarget(api_endpoint=api_endpoint)

# Run evaluation
config = EvaluationConfig(type="mmlu_pro", output_dir="results")
results = evaluate(eval_cfg=config, target_cfg=target)

Endpoint Requirements#

Your endpoint must provide OpenAI-compatible APIs:

Required Endpoints#

Completions: /v1/completions (POST) - For text completion tasks
Chat Completions: /v1/chat/completions (POST) - For conversational tasks
Health Check: /v1/triton_health (GET) - For monitoring (recommended)

Request/Response Format#

Must follow OpenAI API specifications for compatibility with evaluation frameworks.

Configuration Management#

Basic Configuration#

# config/bring_your_own.yaml
deployment:
  type: none  # No launcher deployment

target:
  api_endpoint:
    url: http://your-endpoint:8080/v1/completions
    model_id: your-model-name
    api_key: ${API_KEY}  # Optional

evaluation:
  tasks:
    - name: mmlu_pro
    - name: gsm8k

With Adapters#

target:
  api_endpoint:
    url: http://your-endpoint:8080/v1/completions
    model_id: your-model-name
    
    adapter_config:
      # Caching for efficiency
      use_caching: true
      caching_dir: ./cache
      
      # Request logging for debugging
      use_request_logging: true
      max_logged_requests: 10
      
      # Custom processing
      use_reasoning: true
      start_reasoning_token: "<think>"
      end_reasoning_token: "</think>"

Key Benefits#

Infrastructure Control#

Custom configurations: Tailor deployment to your specific needs
Resource optimization: Optimize for your hardware and workloads
Security compliance: Meet your organization’s security requirements
Cost management: Control costs through efficient resource usage

Operational Flexibility#

Deploy once, evaluate many: Reuse deployments across multiple evaluations
Integration ready: Works with existing infrastructure and workflows
Technology choice: Use any serving framework or cloud provider
Scaling control: Scale according to your requirements

Getting Started#

Choose your approach: Select from manual deployment, hosted services, or enterprise integration
Deploy your model: Set up your OpenAI-compatible endpoint
Configure NeMo Evaluator: Point to your endpoint with proper configuration
Run evaluations: Use launcher or core library to run benchmarks
Monitor and optimize: Track performance and optimize as needed

Next Steps#

Manual Deployment: Learn Manual Deployment techniques
Hosted Services: Explore Hosted Services options
Configure Adapters: Set up Evaluation Adapters for custom processing