Launcher-Orchestrated Deployment#

Let NeMo Evaluator Launcher handle both model deployment and evaluation orchestration automatically. This is the recommended approach for most users, providing automated lifecycle management, multi-backend support, and integrated monitoring.

Overview#

Launcher-orchestrated deployment means the launcher:

Deploys your model using the specified deployment type
Manages the model serving lifecycle
Runs evaluations against the deployed model
Handles cleanup and resource management

The launcher supports multiple deployment backends and execution environments.

Quick Start#

# Deploy model and run evaluation in one command (Slurm example)
nv-eval run \
    --config-dir examples \
    --config-name slurm_llama_3_1_8b_instruct \
    -o deployment.checkpoint_path=/path/to/your/model

Execution Backends#

Choose the execution backend that matches your infrastructure:

Local Execution

Run evaluations on your local machine against existing endpoints. Note: Local executor does not deploy models. Use Slurm or Lepton for deployment.

Local Execution

Slurm Deployment

Deploy on HPC clusters with Slurm workload manager. Ideal for large-scale evaluations with multi-node parallelism.

Slurm Deployment via Launcher

Lepton Deployment

Deploy on Lepton AI cloud platform. Best for cloud-native deployments with managed infrastructure and auto-scaling.

Lepton AI Deployment via Launcher

Deployment Types#

The launcher supports multiple deployment types:

vLLM Deployment#

Fast inference with optimized attention mechanisms
Continuous batching for high throughput
Tensor parallelism support for large models
Memory optimization with configurable GPU utilization

NIM Deployment#

Production-grade reliability with enterprise features
NVIDIA optimized containers for maximum performance
Built-in monitoring and logging capabilities
Enterprise security features

SGLang Deployment#

Structured generation support for complex tasks
Function calling capabilities
JSON mode for structured outputs
Efficient batching for high throughput

No Deployment#

Use existing endpoints without launcher deployment
Bring-your-own-endpoint integration
Flexible configuration for any OpenAI-compatible API

Configuration Overview#

Basic configuration structure for launcher-orchestrated deployment:

# Use Hydra defaults to compose config
defaults:
  - execution: slurm/default  # or lepton/default; local does not deploy
  - deployment: vllm  # or nim, sglang, none
  - _self_

# Deployment configuration
deployment:
  checkpoint_path: /path/to/model  # Or HuggingFace model ID
  served_model_name: my-model
  # ... deployment-specific options

# Execution backend configuration
execution:
  account: my-account
  output_dir: /path/to/results
  # ... backend-specific options

# Evaluation tasks
evaluation:
  tasks:
    - name: mmlu_pro
    - name: gsm8k

Key Benefits#

Automated Lifecycle Management#

Deployment automation: No manual setup required
Resource management: Automatic allocation and cleanup
Error handling: Built-in retry and recovery mechanisms
Monitoring integration: Real-time status and logging

Multi-Backend Support#

Consistent interface: Same commands work across all backends
Environment flexibility: Local development to production clusters
Resource optimization: Backend-specific optimizations
Scalability: From single GPU to multi-node deployments

Integrated Workflows#

End-to-end automation: From model to results in one command
Configuration management: Version-controlled, reproducible configs
Result integration: Built-in export and analysis tools
Monitoring and debugging: Comprehensive logging and status tracking

Getting Started#

Choose your backend: Start with Local Execution for development
Configure your model: Set deployment type and model path
Run evaluation: Use the launcher to deploy and evaluate
Monitor progress: Check status and logs during execution
Analyze results: Export and analyze evaluation outcomes

Next Steps#

Local Development: Start with Local Execution for testing
Scale Up: Move to Slurm Deployment via Launcher for production workloads
Cloud Native: Try Lepton AI Deployment via Launcher for managed infrastructure
Configure Adapters: Set up Evaluation Adapters for custom processing