Bring-Your-Own-Endpoint#

Deploy and manage model serving yourself, then point NeMo Evaluator to your endpoint. This approach gives you full control over deployment infrastructure while still leveraging NeMo Evaluator’s evaluation capabilities.

Overview#

With bring-your-own-endpoint, you:

  • Handle model deployment and serving independently

  • Provide an OpenAI-compatible API endpoint

  • Use either the launcher or core library for evaluations

  • Maintain full control over infrastructure and scaling

When to Use This Approach#

Choose bring-your-own-endpoint when you:

  • Have existing model serving infrastructure

  • Need custom deployment configurations

  • Want to deploy once and run many evaluations

  • Have specific security or compliance requirements

  • Use enterprise Kubernetes or MLOps pipelines

Deployment Approaches#

Choose the approach that best fits your infrastructure and requirements:

Manual Deployment

Deploy using vLLM, TensorRT-LLM, or custom serving frameworks for full control.

Manual Deployment
Hosted Services

Use NVIDIA Build, OpenAI API, or other cloud providers for instant availability.

Hosted Services

Quick Examples#

Using Launcher with Existing Endpoint#

# Point launcher to your deployed model
nemo-evaluator-launcher run \
    --config-dir packages/nemo-evaluator-launcher/examples \
    --config-name local_llama_3_1_8b_instruct \
    -o target.api_endpoint.url=http://your-endpoint:8080/v1/completions \
    -o target.api_endpoint.model_id=your-model-name \
    -o deployment.type=none  # No launcher deployment

Using Core Library#

from nemo_evaluator import (
    ApiEndpoint, EvaluationConfig, EvaluationTarget, evaluate
)

# Configure your endpoint
api_endpoint = ApiEndpoint(
    url="http://your-endpoint:8080/v1/completions",
    model_id="your-model-name"
)
target = EvaluationTarget(api_endpoint=api_endpoint)

# Run evaluation
config = EvaluationConfig(type="mmlu_pro", output_dir="results")
results = evaluate(eval_cfg=config, target_cfg=target)

Endpoint Requirements#

Your endpoint must provide OpenAI-compatible APIs:

Required Endpoints#

  • Completions: /v1/completions (POST) - For text completion tasks

  • Chat Completions: /v1/chat/completions (POST) - For conversational tasks

  • Health Check: /v1/triton_health (GET) - For monitoring (recommended)

Request/Response Format#

Must follow OpenAI API specifications for compatibility with evaluation frameworks.

Configuration Management#

Basic Configuration#

# config/bring_your_own.yaml
deployment:
  type: none  # No launcher deployment

target:
  api_endpoint:
    url: http://your-endpoint:8080/v1/completions
    model_id: your-model-name
    api_key: ${API_KEY}  # Optional

evaluation:
  tasks:
    - name: mmlu_pro
    - name: gsm8k

With Adapters#

target:
  api_endpoint:
    url: http://your-endpoint:8080/v1/completions
    model_id: your-model-name
    
    adapter_config:
      # Caching for efficiency
      use_caching: true
      caching_dir: ./cache
      
      # Request logging for debugging
      use_request_logging: true
      max_logged_requests: 10
      
      # Custom processing
      use_reasoning: true
      start_reasoning_token: "<think>"
      end_reasoning_token: "</think>"

Key Benefits#

Infrastructure Control#

  • Custom configurations: Tailor deployment to your specific needs

  • Resource optimization: Optimize for your hardware and workloads

  • Security compliance: Meet your organization’s security requirements

  • Cost management: Control costs through efficient resource usage

Operational Flexibility#

  • Deploy once, evaluate many: Reuse deployments across multiple evaluations

  • Integration ready: Works with existing infrastructure and workflows

  • Technology choice: Use any serving framework or cloud provider

  • Scaling control: Scale according to your requirements

Getting Started#

  1. Choose your approach: Select from manual deployment, hosted services, or enterprise integration

  2. Deploy your model: Set up your OpenAI-compatible endpoint

  3. Configure NeMo Evaluator: Point to your endpoint with proper configuration

  4. Run evaluations: Use launcher or core library to run benchmarks

  5. Monitor and optimize: Track performance and optimize as needed

Next Steps#