None Deployment#
The “none” deployment option means no model deployment is performed. Instead, you provide an existing OpenAI-compatible endpoint. The launcher handles running evaluation tasks while connecting to your existing endpoint.
When to Use None Deployment#
Existing Endpoints: You have a running model endpoint to evaluate
Third-Party Services: Testing models from NVIDIA API Catalog, OpenAI, or other providers
Custom Infrastructure: Using your own deployment solution outside the launcher
Cost Optimization: Reusing existing deployments across multiple evaluation runs
Separation of Concerns: Keeping model deployment and evaluation as separate processes
Key Benefits#
No Resource Management: No need to provision or manage model deployment resources
Platform Flexibility: Works with Local, Lepton, and SLURM execution platforms
Quick Setup: Minimal configuration required - just point to your endpoint
Cost Effective: Leverage existing deployments without additional infrastructure
Universal Configuration#
These configuration patterns apply to all execution platforms when using “none” deployment.
Target Endpoint Setup#
target:
api_endpoint:
model_id: meta/llama-3.1-8b-instruct # Model identifier (required)
url: https://your-endpoint.com/v1/chat/completions # Endpoint URL (required)
api_key_name: API_KEY # Environment variable name (recommended)
/// note | Legacy Adapter Configuration The following adapter configuration parameters use the legacy format and are maintained for backward compatibility. For new configurations, use the modern interceptor-based system documented in System Messages and Reasoning.
target:
api_endpoint:
# Legacy adapter configuration (supported but not recommended for new configs)
adapter_config:
use_reasoning: false # Strip reasoning tokens if true
use_system_prompt: true # Enable system prompt support
custom_system_prompt: "Think step by step." # Custom system prompt
///
Evaluation Configuration#
evaluation:
# Global overrides (apply to all tasks)
overrides:
config.params.request_timeout: 3600
config.params.temperature: 0.7
# Task-specific configuration
tasks:
- name: gpqa_diamond
overrides:
config.params.temperature: 0.6
config.params.max_new_tokens: 8192
config.params.parallelism: 32
env_vars:
HF_TOKEN: HF_TOKEN_FOR_GPQA_DIAMOND
- name: mbpp
overrides:
config.params.extra.n_samples: 5
Platform Examples#
Choose your execution platform and see the specific configuration needed:
Best for: Development, testing, small-scale evaluations
defaults:
- execution: local
- deployment: none
- _self_
execution:
output_dir: results
target:
api_endpoint:
model_id: meta/llama-3.1-8b-instruct
url: https://integrate.api.nvidia.com/v1/chat/completions
api_key_name: API_KEY
evaluation:
tasks:
- name: gpqa_diamond
Key Points:
Minimal configuration required
Set environment variables in your shell
Limited by local machine resources
Best for: Production evaluations, team environments, scalable workloads
defaults:
- execution: lepton/default
- deployment: none
- _self_
execution:
lepton_platform:
tasks:
env_vars:
HF_TOKEN:
value_from:
secret_name_ref: "HUGGING_FACE_HUB_TOKEN_read"
API_KEY: "UNIQUE_ENDPOINT_TOKEN"
node_group: "your-node-group"
mounts:
- from: "node-nfs:shared-fs"
path: "/workspace/path"
mount_path: "/workspace"
target:
api_endpoint:
model_id: meta/llama-3.1-8b-instruct
url: https://your-endpoint.lepton.run/v1/chat/completions
api_key_name: API_KEY
evaluation:
tasks:
- name: gpqa_diamond
Key Points:
Requires Lepton credentials (
lep login
)Use
secret_name_ref
for secure credential storageConfigure node groups and storage mounts
Handles larger evaluation workloads
Best for: HPC environments, large-scale evaluations, batch processing
defaults:
- execution: slurm/default
- deployment: none
- _self_
execution:
account: your-slurm-account
output_dir: /shared/filesystem/results
walltime: "02:00:00"
partition: cpu_short
gpus_per_node: null # No GPUs needed
target:
api_endpoint:
model_id: meta/llama-3.1-8b-instruct
url: https://integrate.api.nvidia.com/v1/chat/completions
api_key_name: API_KEY
evaluation:
tasks:
- name: gpqa_diamond
Key Points:
Requires SLURM account and accessible output directory
Creates one job per benchmark evaluation
Uses CPU partitions (no GPUs needed for none deployment)
Supports CLI overrides for flexible job submission
Advanced Features#
CLI Overrides#
Override any configuration value from the command line using dot notation:
# Override execution settings
nemo-evaluator-launcher run --config-name your_config execution.walltime="1:00:00"
# Override endpoint URL
nemo-evaluator-launcher run --config-name your_config target.api_endpoint.url="https://new-endpoint.com/v1/chat/completions"
# Override evaluation parameters
nemo-evaluator-launcher run --config-name your_config evaluation.overrides.config.params.temperature=0.8
Common Configuration Overrides#
Request Parameters:
config.params.temperature
: Control randomness (0.0-1.0)config.params.max_new_tokens
: Maximum response lengthconfig.params.parallelism
: Concurrent request limitconfig.params.request_timeout
: Request timeout in seconds
Task-Specific:
config.params.extra.n_samples
: Number of samples per prompt (for code tasks)Environment variables for dataset access (like
HF_TOKEN
)
Automatic Result Export#
Automatically export evaluation results to multiple destinations for experiment tracking and collaboration.
Supported Destinations: W&B, MLflow, Google Sheets
Basic Configuration#
execution:
auto_export:
destinations: ["wandb", "mlflow", "gsheets"]
configs:
wandb:
entity: "your-team"
project: "llm-evaluation"
name: "experiment-name"
tags: ["llama-3.1", "baseline"]
log_metrics: ["accuracy", "pass@1"]
mlflow:
tracking_uri: "http://mlflow.company.com:5000"
experiment_name: "LLM-Baselines-2024"
log_metrics: ["accuracy", "pass@1"]
gsheets:
spreadsheet_name: "LLM Evaluation Results"
log_mode: "multi_task"
/// note For detailed exporter configuration, see Exporters. ///
Key Configuration Options#
log_metrics
: Filter which metrics to export (e.g.,["accuracy", "pass@1"]
)log_mode
: “multi_task” (all tasks together) or “per_task” (separate entries)extra_metadata
: Additional experiment metadata and tagsEnvironment variables: Use
${oc.env:VAR_NAME}
for secure credential handling