Hosted Services#
Use existing hosted model APIs from cloud providers without managing your own infrastructure. This approach offers the fastest path to evaluation with minimal setup requirements.
Overview#
Hosted services provide:
Pre-deployed models accessible via API
No infrastructure management required
Pay-per-use pricing models
Instant availability and global access
Professional SLA and support
NVIDIA Build#
NVIDIA’s catalog of ready-to-use AI models with OpenAI-compatible APIs.
NVIDIA Build Setup and Authentication#
# Get your NGC API key from https://build.nvidia.com
export NGC_API_KEY="nvapi-your-ngc-api-key"
# Test authentication
curl -H "Authorization: Bearer $NGC_API_KEY" \
"https://integrate.api.nvidia.com/v1/models"
Refer to the NVIDIA Build catalog for available models.
NVIDIA Build Configuration#
Basic NVIDIA Build Evaluation#
# config/nvidia_build_basic.yaml
defaults:
- execution: local
- deployment: none # No deployment needed
- _self_
target:
api_endpoint:
url: https://integrate.api.nvidia.com/v1/chat/completions
model_id: meta/llama-3.1-8b-instruct
api_key_name: NGC_API_KEY # Name of environment variable
execution:
output_dir: ./results
evaluation:
overrides:
config.params.limit_samples: 100
tasks:
- name: ifeval
Multi-Model Comparison#
For multi-model comparison, run separate evaluations for each model and compare results:
# Evaluate first model
nemo-evaluator-launcher run \
--config packages/nemo-evaluator-launcher/examples/local_llama_3_1_8b_instruct.yaml \
-o target.api_endpoint.model_id=meta/llama-3.1-8b-instruct \
-o execution.output_dir=./results/llama-3.1-8b
# Evaluate second model
nemo-evaluator-launcher run \
--config packages/nemo-evaluator-launcher/examples/local_llama_3_1_8b_instruct.yaml \
-o target.api_endpoint.model_id=meta/llama-3.1-70b-instruct \
-o execution.output_dir=./results/llama-3.1-70b
# Gather the results
nemo-evaluator-launcher export <first-job-id> <second-job-id> --dest local --format json
OpenAI API#
Direct integration with OpenAI’s GPT models for comparison and benchmarking.
OpenAI Setup and Authentication#
# Get API key from https://platform.openai.com/api-keys
export OPENAI_API_KEY="your-openai-api-key"
# Test authentication
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
"https://api.openai.com/v1/models"
Refer to the OpenAI model documentation for available models.
OpenAI Configuration#
Basic OpenAI Evaluation#
# config/openai_basic.yaml
defaults:
- execution: local
- deployment: none
- _self_
target:
api_endpoint:
url: https://api.openai.com/v1/chat/completions
model_id: gpt-4
api_key_name: OPENAI_API_KEY # Name of environment variable
execution:
output_dir: ./results
evaluation:
overrides:
config.params.limit_samples: 100
tasks:
- name: ifeval
Cost-Optimized Configuration#
# config/openai_cost_optimized.yaml
defaults:
- execution: local
- deployment: none
- _self_
target:
api_endpoint:
url: https://api.openai.com/v1/chat/completions
model_id: gpt-3.5-turbo # Less expensive model
api_key_name: OPENAI_API_KEY
execution:
output_dir: ./results
evaluation:
overrides:
config.params.limit_samples: 50 # Smaller sample size
config.params.parallelism: 2 # Lower parallelism to respect rate limits
tasks:
- name: mmlu_pro
Troubleshooting#
Authentication Errors#
Verify that your API key has the correct value:
# Verify NVIDIA Build API key
curl -H "Authorization: Bearer $NGC_API_KEY" \
"https://integrate.api.nvidia.com/v1/models"
# Verify OpenAI API key
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
"https://api.openai.com/v1/models"
Rate Limiting#
If you encounter rate limit errors (HTTP 429), reduce the parallelism parameter in your configuration:
evaluation:
overrides:
config.params.parallelism: 2 # Lower parallelism to respect rate limits