Configure Models#
Connect and customize the models that power your synthetic datasets in NeMo Data Designer. You can use model aliases for easy reference, and fine-tune inference parameters to get the right output quality and style you need.
Model Configuration Structure#
Model configurations are defined using the ModelConfig
class, which connects your Data Designer columns to specific AI models and their settings. Each configuration acts as a named template that columns can reference via the model_alias
parameter.
Core components:
ModelConfig: Main configuration container
Model: Contains API endpoint details
ApiEndpoint: Defines connection parameters
InferenceParameters: Controls generation behavior
Model configurations define how to connect to and use AI models. Each configuration includes an alias (name), model endpoint, and optional inference parameters:
model_configs:
- alias: "text"
model:
api_endpoint:
url: "https://integrate.api.nvidia.com/v1"
model_id: "meta/llama-3.3-70b-instruct"
api_key: "your-api-key"
inference_parameters:
temperature: 0.7
top_p: 0.9
max_tokens: 1024
- alias: "code"
model:
api_endpoint:
url: "https://integrate.api.nvidia.com/v1"
model_id: "qwen/qwen2.5-coder-32b-instruct"
api_key: "your-api-key"
inference_parameters:
temperature: 0.3
top_p: 0.9
max_tokens: 1500
{
"model_configs": [
{
"alias": "text",
"model": {
"api_endpoint": {
"url": "https://integrate.api.nvidia.com/v1",
"model_id": "meta/llama-3.3-70b-instruct",
"api_key": "your-api-key"
}
},
"inference_parameters": {
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 1024
}
},
{
"alias": "code",
"model": {
"api_endpoint": {
"url": "https://integrate.api.nvidia.com/v1",
"model_id": "qwen/qwen2.5-coder-32b-instruct",
"api_key": "your-api-key"
}
},
"inference_parameters": {
"temperature": 0.3,
"top_p": 0.9,
"max_tokens": 1500
}
}
]
}
Required Fields#
Field |
Description |
Required |
---|---|---|
|
Unique name to reference this model configuration |
Yes |
|
API endpoint URL for the model service |
Yes |
|
Model identifier (e.g., meta/llama-3.3-70b-instruct) |
Yes |
|
Authentication key |
When required by endpoint |
|
Model generation settings |
Yes |
Model Aliases#
Model aliases provide a convenient way to reference configured models from your data columns.
Referencing Aliases in Columns#
Once defined, reference your model aliases in column configurations:
model_configs:
- alias: "creative-writer"
model:
api_endpoint:
url: "https://integrate.api.nvidia.com/v1"
model_id: "meta/llama-3.1-8b-instruct"
api_key: "your-api-key"
inference_parameters:
temperature: 0.8
top_p: 0.95
max_tokens: 1000
columns:
- name: "story"
type: "llm-text"
prompt: "Write an engaging story about {{theme}}"
model_alias: "creative-writer"
{
"model_configs": [
{
"alias": "creative-writer",
"model": {
"api_endpoint": {
"url": "https://integrate.api.nvidia.com/v1",
"model_id": "meta/llama-3.1-8b-instruct",
"api_key": "your-api-key"
}
},
"inference_parameters": {
"temperature": 0.8,
"top_p": 0.95,
"max_tokens": 1000
}
}
],
"columns": [
{
"name": "story",
"type": "llm-text",
"prompt": "Write an engaging story about {{theme}}",
"model_alias": "creative-writer"
}
]
}
Usage Examples#
from nemo_microservices.beta.data_designer.config import columns as C
from nemo_microservices.beta.data_designer.config import params as P
# Define model configuration
model_config = P.ModelConfig(
alias="creative-writer",
model=P.Model(
api_endpoint=P.ApiEndpoint(
url="https://integrate.api.nvidia.com/v1",
model_id="meta/llama-3.1-8b-instruct",
api_key="your-api-key"
)
),
inference_parameters=P.InferenceParameters(
temperature=0.8,
top_p=0.95,
max_tokens=1000
)
)
column = C.LLMTextColumn(
name="story",
prompt="Write an engaging story about {{theme}}",
model_alias="creative-writer"
)
curl -X POST "https://your-nemo-endpoint/api/v1/data-designer/generate" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"model_configs": [
{
"alias": "creative-writer",
"model": {
"api_endpoint": {
"url": "https://integrate.api.nvidia.com/v1",
"model_id": "meta/llama-3.1-8b-instruct",
"api_key": "your-api-key"
}
},
"inference_parameters": {
"temperature": 0.8,
"top_p": 0.95,
"max_tokens": 1000
}
}
],
"columns": [
{
"name": "story",
"type": "llm-text",
"prompt": "Write an engaging story about {{theme}}",
"model_alias": "creative-writer"
}
],
"num_rows": 10
}'
Inference Parameters#
Inference parameters control how your AI models generate content by adjusting their behavior during text generation. These parameters are defined using the InferenceParameters
class and can be set as fixed values or dynamic distributions for varied outputs across your dataset.
The InferenceParameters
class supports the following core parameters:
Core Parameters#
Temperature#
Controls randomness and creativity in outputs.
Range: 0.0 - 2.0
Low values (0.0-0.3): Deterministic, consistent outputs (good for code, technical content)
Medium values (0.3-0.7): Balanced creativity (good for general content)
High values (0.7-2.0): Creative, varied outputs (good for stories, marketing)
inference_parameters:
temperature: 0.7 # Balanced creativity
Top-p (Nucleus Sampling)#
Limits token selection to most probable choices.
Range: 0.0 - 1.0
Low values (0.1-0.5): Focused, conservative word choices
High values (0.8-1.0): Diverse vocabulary and phrasing
inference_parameters:
top_p: 0.9 # Good balance of diversity and focus
Max Tokens#
Sets the limit for generated content length.
Range: 1+ (model endpoints have their own limits)
Guidelines: ~100 tokens ≈ 75 words, ~500 tokens ≈ 375 words
inference_parameters:
max_tokens: 1000 # ~750 words
Max Parallel Requests#
Sets the limit on concurrency of requests to LLMs.
Guidelines: Start with 2-8 and scale up depending on the scalability of the model endpoints hosting your LLMs
inference_parameters:
max_parallel_requests: 8
Dynamic Parameters#
Use distributions for varied outputs across records:
inference_parameters:
temperature:
distribution_type: "uniform"
params:
low: 0.5
high: 0.9
top_p:
distribution_type: "manual"
params:
values: [0.8, 0.9, 0.95]
weights: [0.3, 0.4, 0.3]
Configuration Examples#
Content Specialization#
model_configs:
# Marketing content
- alias: "marketing"
model:
api_endpoint:
url: "https://integrate.api.nvidia.com/v1"
model_id: "meta/llama-3.1-8b-instruct"
api_key: "your-api-key"
inference_parameters:
temperature: 0.8
top_p: 0.95
max_tokens: 800
# Technical documentation
- alias: "technical"
model:
api_endpoint:
url: "https://integrate.api.nvidia.com/v1"
model_id: "meta/llama-3.3-70b-instruct"
api_key: "your-api-key"
inference_parameters:
temperature: 0.3
top_p: 0.85
max_tokens: 1200
# Code generation
- alias: "code"
model:
api_endpoint:
url: "https://integrate.api.nvidia.com/v1"
model_id: "qwen/qwen2.5-coder-32b-instruct"
api_key: "your-api-key"
inference_parameters:
temperature: 0.2
top_p: 0.9
max_tokens: 1500
Best Practices#
Naming Conventions#
Use descriptive, hyphenated names for aliases:
# ✅ Good: Clear purpose and scope
alias: "technical-documentation"
alias: "creative-storytelling"
alias: "precise-code-generation"
# ❌ Avoid: Vague or unclear names
alias: "model1"
alias: "good"
alias: "temp_high"
Parameter Tuning#
Start with these baseline values and adjust based on your results:
Code generation:
temperature: 0.2, top_p: 0.9
Technical writing:
temperature: 0.3, top_p: 0.85
General content:
temperature: 0.6, top_p: 0.9
Creative writing:
temperature: 0.8, top_p: 0.95
Testing#
Always test your configurations with preview mode before production:
# Add test columns to validate behavior
columns:
- name: "test_output"
type: "llm-text"
prompt: "Test prompt for {{topic}}"
model_alias: "your-custom-alias"
Common Issues#
“Alias not found” error: Ensure you define the alias in
model_configs
before using it in columns.“Model not available” error: Verify the
model_id
exists and is accessible via your endpoint.Invalid parameter ranges: Check that temperature (0.0-2.0),
top_p
(0.0-1.0), andmax_tokens
(≥1) are within valid ranges.