None Deployment#
The “none” deployment option means no model deployment is performed. Instead, you provide an existing OpenAI-compatible endpoint. The launcher handles running evaluation tasks while connecting to your existing endpoint.
When to Use None Deployment#
Existing Endpoints: You have a running model endpoint to evaluate
Third-Party Services: Testing models from NVIDIA API Catalog, OpenAI, or other providers
Custom Infrastructure: Using your own deployment solution outside the launcher
Cost Optimization: Reusing existing deployments across multiple evaluation runs
Separation of Concerns: Keeping model deployment and evaluation as separate processes
Key Benefits#
No Resource Management: No need to provision or manage model deployment resources
Platform Flexibility: Works with Local, Lepton, and SLURM execution platforms
Quick Setup: Minimal configuration required - just point to your endpoint
Cost Effective: Leverage existing deployments without additional infrastructure
Universal Configuration#
These configuration patterns apply to all execution platforms when using “none” deployment.
Target Endpoint Setup#
target:
api_endpoint:
model_id: meta/llama-3.1-8b-instruct # Model identifier (required)
url: https://your-endpoint.com/v1/chat/completions # Endpoint URL (required)
api_key_name: API_KEY # Environment variable name (recommended)
Platform Examples#
Choose your execution platform and see the specific configuration needed:
Best for: Development, testing, small-scale evaluations
defaults:
- execution: local
- deployment: none
- _self_
execution:
output_dir: results
target:
api_endpoint:
model_id: meta/llama-3.1-8b-instruct
url: https://integrate.api.nvidia.com/v1/chat/completions
api_key_name: NGC_API_KEY
evaluation:
tasks:
- name: gpqa_diamond
env_vars:
HF_TOKEN: HF_TOKEN_FOR_GPQA_DIAMOND # Click request access for GPQA-Diamond: https://huggingface.co/datasets/Idavidrein/gpqa
Key Points:
Minimal configuration required
Set environment variables in your shell
Limited by local machine resources
Best for: Production evaluations, team environments, scalable workloads
defaults:
- execution: lepton/default
- deployment: none
- _self_
execution:
output_dir: results
lepton_platform:
tasks:
api_tokens:
- value_from:
token_name_ref: "ENDPOINT_API_KEY"
env_vars:
HF_TOKEN:
value_from:
secret_name_ref: "HUGGING_FACE_HUB_TOKEN"
API_KEY:
value_from:
secret_name_ref: "ENDPOINT_API_KEY"
node_group: "your-node-group"
mounts:
- from: "node-nfs:shared-fs"
path: "/workspace/path"
mount_path: "/workspace"
target:
api_endpoint:
model_id: meta/llama-3.1-8b-instruct
url: https://your-endpoint.lepton.run/v1/chat/completions
api_key_name: API_KEY
evaluation:
tasks:
- name: gpqa_diamond
Key Points:
Requires Lepton credentials (
lep login
)Use
secret_name_ref
for secure credential storageConfigure node groups and storage mounts
Handles larger evaluation workloads
Best for: HPC environments, large-scale evaluations, batch processing
defaults:
- execution: slurm/default
- deployment: none
- _self_
execution:
account: your-slurm-account
output_dir: /shared/filesystem/results
walltime: "02:00:00"
partition: cpu_short
gpus_per_node: null # No GPUs needed
target:
api_endpoint:
model_id: meta/llama-3.1-8b-instruct
url: https://integrate.api.nvidia.com/v1/chat/completions
api_key_name: NGC_API_KEY # API Key with access to build.nvidia.com
evaluation:
tasks:
- name: gpqa_diamond
env_vars:
HF_TOKEN: HF_TOKEN_FOR_GPQA_DIAMOND # Click request access for GPQA-Diamond: https://huggingface.co/datasets/Idavidrein/gpqa
Key Points:
Requires SLURM account and accessible output directory
Creates one job per benchmark evaluation
Uses CPU partitions (no GPUs needed for none deployment)