LLM Client Configuration
NeMo Curator’s synthetic data generation uses OpenAI-compatible clients to communicate with LLM inference servers. This guide covers client configuration, performance tuning, and integration with various endpoints.
Overview
Two client types are available:
AsyncOpenAIClient: Recommended for high-throughput batch processing with concurrent requestsOpenAIClient: Synchronous client for simpler use cases or debugging
For most SDG workloads, use AsyncOpenAIClient to maximize throughput.
Basic Configuration
NVIDIA API Endpoints
Environment Variables
Set your API key as an environment variable to avoid hardcoding credentials:
The underlying OpenAI client automatically uses the OPENAI_API_KEY environment variable if no api_key is provided. For NVIDIA APIs, explicitly pass the key:
Generation Parameters
Configure LLM generation behavior using GenerationConfig:
Performance Tuning
Concurrency vs. Parallelism
The max_concurrent_requests parameter controls how many API requests the client can have in-flight simultaneously. This interacts with Ray’s distributed workers:
- Client-level concurrency:
max_concurrent_requestslimits concurrent API calls per worker - Worker-level parallelism: Ray distributes tasks across multiple workers
Retry Configuration
The client includes automatic retry with exponential backoff for transient errors:
The retry logic handles:
- Rate limit errors (429): Automatic backoff with jitter
- Connection errors: Retry with exponential delay
- Transient failures: Configurable retry attempts
Using Other OpenAI-Compatible Endpoints
The AsyncOpenAIClient works with any OpenAI-compatible API endpoint. Simply configure the base_url and api_key parameters:
Complete Example
Troubleshooting
Rate Limit Errors
If you encounter frequent 429 errors:
- Reduce
max_concurrent_requests - Increase
base_delayfor longer backoff - Consider using a local deployment for high-volume workloads
Connection Timeouts
For slow networks or high-latency endpoints:
Next Steps
- Multilingual Q&A: Generate multilingual Q&A pairs
- Nemotron-CC: Advanced text transformation pipelines