Understanding NeMo Customizer Configurations and Models#
Learn the fundamentals of NeMo Customizer configurations and models to make informed decisions about your fine-tuning projects. This tutorial covers what configurations are available, which models you can use, and how to choose the right approach for your use case.
Understanding these basics will help you navigate the fine-tuning process more effectively and avoid common configuration issues. If you’re ready to start fine-tuning immediately, you can jump to Format Training Dataset after completing this tutorial.
Note
The time to complete this tutorial is approximately 15 minutes. This tutorial focuses on understanding and discovery—no actual training jobs are created.
Prerequisites#
Platform Prerequisites#
New to using NeMo microservices?
NeMo microservices use an entity management system to organize all resources—including datasets, models, and job artifacts—into namespaces and projects. Without setting up these organizational entities first, you cannot use the microservices.
If you’re new to the platform, complete these foundational tutorials first:
- Get Started Tutorials: Learn how to deploy, customize, and evaluate models using the platform end-to-end 
- Set Up Organizational Entities: Learn how to create namespaces and projects to organize your work 
If you’re already familiar with namespaces, projects, and how to upload datasets to the platform, you can proceed directly with this tutorial.
Learn more: Entity Concepts
NeMo Customizer Prerequisites#
Microservice Setup Requirements and Environment Variables
Before starting, make sure you have:
- Access to NeMo Customizer 
- The - huggingface_hubPython package installed
- (Optional) Weights & Biases account and API key for enhanced visualization 
Set up environment variables:
# Set up environment variables
export CUSTOMIZER_BASE_URL="<your-customizer-service-url>"
export ENTITY_HOST="<your-entity-store-url>"
export DS_HOST="<your-datastore-url>"
export NAMESPACE="default"
export DATASET_NAME="test-dataset"
# Hugging Face environment variables (for dataset/model file management)
export HF_ENDPOINT="${DS_HOST}/v1/hf"
export HF_TOKEN="dummy-unused-value"  # Or your actual HF token
# Optional monitoring
export WANDB_API_KEY="<your-wandb-api-key>"
Replace the placeholder values with your actual service URLs and credentials.
What Are Customization Configurations?#
Customization configurations are pre-built recipes that combine three key elements:
- Model: The AI model you want to customize (Llama, Phi, embedding models, etc.) 
- Hardware: The GPU requirements and parallelization settings 
- Training Options: Available training types (LoRA, Full SFT, DPO, etc.) 
Think of configurations like cooking recipes—they specify the ingredients (model), equipment (hardware), and cooking methods (training types) needed to achieve your desired result.
Understanding Model Names vs. Configuration Names#
It’s important to understand the difference between how models are referenced and how configurations are named. These serve different purposes in the NeMo Customizer ecosystem.
Model names include the organization prefix and identify the actual AI model:
meta/llama-3.1-8b-instruct
│    │
│    └─ Model name and variant
└─ Organization (Hugging Face namespace)
This is how the model is referenced in Hugging Face and in the configuration’s base_model field.
Configuration names follow a specific pattern that tells you important information:
llama-3.1-8b-instruct@v1.0.0+A100
│                    │      │
│                    │      └─ Hardware target
│                    └─ Version
└─ Model identifier (without org prefix)
Configuration names use simplified model identifiers for brevity and consistency across the platform.
Note
Namespace vs. Organization: Configuration namespaces and model organizations serve different purposes and are independent:
- Configuration namespace: User/admin-defined namespace where the config is stored (often defaults to - "default")
- Model organization: The Hugging Face organization that owns the model (like - meta/in the base_model)
Important
Hardware Compatibility: Configurations marked as +A100 are fully compatible with B200 GPUs. The naming reflects the original target hardware, but the underlying resource requirements work across compatible GPU families.
Discovering Available Configurations#
List All Enabled Configurations#
Start by seeing what configurations are immediately available to you:
from nemo_microservices import NeMoMicroservices
import os
# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['CUSTOMIZER_BASE_URL']
)
# Get all enabled configurations
configs = client.customization.configs.list(
    filter={"enabled": True}
)
print(f"You have {len(configs.data)} configurations available:")
for config in configs.data:
    print(f"  • {config.name}")
    print(f"    Description: {config.description}")
    print(f"    Training options: {len(config.training_options)}")
    for option in config.training_options:
        print(f"      - {option.training_type}/{option.finetuning_type}: {option.num_gpus} GPUs")
    print()
# List all enabled configurations
curl "${CUSTOMIZER_BASE_URL}/v1/customization/configs" \
  --data-urlencode "filter[enabled]=true" \
  --data-urlencode "page_size=50" | jq '.data[] | {name, description, training_options}'
Discover All Configurations (Including Disabled)#
To see the full range of possibilities, including configurations that might be available but currently disabled:
# List ALL configurations (enabled and disabled)
all_configs = client.customization.configs.list(page_size=50)
print(f"Total configurations in your environment: {len(all_configs.data)}")
enabled_count = sum(1 for config in all_configs.data if config.target.enabled)
disabled_count = len(all_configs.data) - enabled_count
print(f"  ✓ Enabled: {enabled_count}")
print(f"  ✗ Disabled: {disabled_count}")
# Show disabled configurations
print(f"\nDisabled configurations (contact admin to enable):")
for config in all_configs.data:
    if not config.target.enabled:
        print(f"  • {config.name} - {config.description}")
# List all configurations
curl "${CUSTOMIZER_BASE_URL}/v1/customization/configs" \
  --data-urlencode "page_size=50" | jq
# List only disabled configurations
curl "${CUSTOMIZER_BASE_URL}/v1/customization/configs" \
  --data-urlencode "filter[enabled]=false" \
  --data-urlencode "page_size=50" | jq '.data[] | {name, description}'
Example Response
{
  "object": "list",
  "data": [
    {
      "name": "meta/llama-3.2-1b-instruct@v1.0.0+A100",
      "namespace": "default",
      "dataset_schemas": [
        {
          "title": "Newline-Delimited JSON File",
          "type": "array",
          "items": {
            "description": "Schema for Supervised Fine-Tuning (SFT) training data items.",
            "properties": {
              "prompt": {
                "description": "The prompt for the entry",
                "title": "Prompt",
                "type": "string"
              },
              "completion": {
                "description": "The completion to train on",
                "title": "Completion",
                "type": "string"
              }
            },
            "required": ["prompt", "completion"],
            "title": "SFTDatasetItemSchema",
            "type": "object"
          }
        }
      ],
      "training_options": [
        {
          "training_type": "sft",
          "finetuning_type": "lora",
          "num_gpus": 1,
          "num_nodes": 1,
          "tensor_parallel_size": 1,
          "use_sequence_parallel": false
        },
        {
          "training_type": "sft",
          "finetuning_type": "all_weights",
          "num_gpus": 1,
          "num_nodes": 1,
          "tensor_parallel_size": 1,
          "use_sequence_parallel": false
        }
      ]
    },
    {
      "name": "nvidia/llama-3.2-nv-embedqa-1b@v2+A100",
      "namespace": "nvidia",
      "dataset_schemas": [
        {
          "title": "Newline-Delimited JSON File",
          "type": "array",
          "items": {
            "description": "Schema for embedding training data items.",
            "properties": {
              "query": {
                "description": "The query to use as an anchor",
                "title": "Query",
                "type": "string"
              },
              "pos_doc": {
                "description": "A document that should match positively with the anchor",
                "title": "Positive Document",
                "type": "string"
              },
              "neg_doc": {
                "description": "Documents that should not match with the anchor",
                "title": "Negative Documents",
                "type": "array",
                "items": {"type": "string"}
              }
            },
            "required": ["query", "pos_doc", "neg_doc"],
            "title": "EmbeddingDatasetItemSchema",
            "type": "object"
          }
        }
      ],
      "training_options": [
        {
          "training_type": "sft",
          "finetuning_type": "lora_merged",
          "num_gpus": 1,
          "num_nodes": 1,
          "tensor_parallel_size": 1,
          "use_sequence_parallel": false
        }
      ]
    }
  ]
}
Note
Configuration vs. Target Architecture: Configurations don’t have their own enabled field. Instead, they inherit their availability from their underlying customization targets. When you see config.target.enabled, you’re checking whether the target model that the configuration references is enabled. This architecture allows administrators to control model availability at the target level, which affects all configurations that use that target.
Understanding Model Types and Capabilities#
Language Models#
These models are designed for text generation, instruction following, and conversational AI:
| Model Family | Description | Examples | 
|---|---|---|
| Llama Models | General-purpose language models excellent for instruction following, conversation, and text generation tasks | 
 | 
| Llama Nemotron Models | NVIDIA’s specialized variants optimized for specific use cases with enhanced reasoning capabilities | Various Nano and Super variants | 
| Phi Models | Microsoft’s efficient models designed for strong reasoning with optimized deployment characteristics | Phi model family configurations | 
| GPT-OSS Models | Open-source GPT-based models supporting Full SFT customization workflows | Various GPT-OSS configurations | 
Specialized Models#
| Model Type | Status | Details | 
|---|---|---|
| Embedding Models | ✅ Supported | Model: Llama 3.2 NV EmbedQA 1B for question-answering and retrieval tasks Use Cases: Semantic search, document retrieval, question-answering systems, RAG pipelines Note: Typically disabled by default—contact your administrator for access | 
| Reranking Models | ❌ Not Supported | Alternative: Use embedding models for retrieval tasks, or implement reranking in your application layer | 
Custom Models#
You can import and fine-tune models from the Hugging Face Transformers library:
- Supported: Any model compatible with the Hugging Face Transformers architecture 
- Process: Import via the private HuggingFace model tutorial 
- Limitations: Some architectures (like Conv1D-based models) are not compatible 
Training Types and Resource Requirements#
Available Training Approaches#
| Training Type | Resource Usage | Training Speed | Flexibility | Best For | 
|---|---|---|---|---|
| LoRA | Low (1-2 GPUs) | Fast | Good | Experiments, quick iterations | 
| Full SFT | High (2-8 GPUs) | Slower | Maximum | Production, maximum performance | 
| DPO | Medium (2-4 GPUs) | Medium | Specialized | Preference alignment | 
| Knowledge Distillation | Medium (varies) | Medium | Specialized | Model compression | 
Checking Resource Requirements#
Use this approach to understand the GPU requirements for different configurations:
# Analyze resource requirements across configurations
configs = client.customization.configs.list(filter={"enabled": True})
print("Resource Requirements Summary:")
print("=" * 50)
for config in configs.data:
    print(f"\n📋 {config.name}")
    print(f"   Base Model: {config.target.base_model}")
    for option in config.training_options:
        gpu_total = option.num_gpus * option.num_nodes
        print(f"   • {option.training_type.upper()}/{option.finetuning_type.upper()}: {gpu_total} total GPUs")
        print(f"     ({option.num_gpus} GPUs × {option.num_nodes} nodes)")
# Find configurations that fit your hardware
available_gpus = 2  # Adjust based on your setup
print(f"\n🔍 Configurations that fit {available_gpus} GPUs:")
for config in configs.data:
    for option in config.training_options:
        if option.num_gpus <= available_gpus:
            print(f"   ✓ {config.name} - {option.training_type}/{option.finetuning_type}")
# Get resource information for all configurations
curl "${CUSTOMIZER_BASE_URL}/v1/customization/configs" \
  --data-urlencode "filter[enabled]=true" | \
  jq '.data[] | {
    name: .name,
    base_model: .target.base_model,
    training_options: .training_options | map({
      type: "\(.training_type)/\(.finetuning_type)",
      gpus: .num_gpus,
      nodes: .num_nodes,
      total_gpus: (.num_gpus * .num_nodes)
    })
  }'
Training Configuration Impact on Deployment#
Understanding how your training configuration choices affect deployment is crucial for planning your fine-tuning strategy. The parallelism and resource settings you choose during training have direct implications for how your models can be deployed and used.
Parallelism Parameters Explained#
When you examine training options in configurations, you’ll see several parallelism parameters that control how training workloads are distributed across GPUs:
| Parameter | Purpose | Impact on Training | 
|---|---|---|
| tensor_parallel_size | Distributes model tensors across GPUs | Higher values reduce memory per GPU but require more GPUs | 
| pipeline_parallel_size | Distributes model layers across GPUs | Enables training larger models by splitting layers | 
| use_sequence_parallel | Enables sequence-level parallelism | Reduces memory usage for long sequences | 
Resource Allocation Rules#
Training configurations must satisfy mathematical constraints to work properly:
Important
GPU Allocation Rule: The total number of GPUs (num_gpus × num_nodes) must be a multiple of:
tensor_parallel_size × pipeline_parallel_size × expert_model_parallel_size
If this constraint isn’t met, your training job will fail with a validation error.
Example Calculations:
- 8 GPUs with - tensor_parallel_size=4, pipeline_parallel_size=2✅ Valid (8 = 4 × 2 × 1)
- 4 GPUs with - tensor_parallel_size=4, pipeline_parallel_size=2❌ Invalid (4 ≠ 4 × 2 × 1)
Model Artifact Types and Deployment Paths#
Your training choices determine how the resulting model can be deployed:
| Training Type | Model Artifact | Deployment Method | Key Environment Variable | 
|---|---|---|---|
| LoRA | Adapter weights only | Uses base model + adapters | 
 | 
| Full SFT | Complete model weights | Standalone model deployment | 
 | 
| DPO | Complete model weights | Standalone model deployment | 
 | 
Deployment Architecture Overview#
The platform uses different deployment strategies based on your training approach:
Architecture: Base model + adapter loading
- Base model remains unchanged 
- Adapters loaded dynamically from Entity Store 
- Multiple adapters can share the same base model 
- Lower storage and memory requirements 
Environment Configuration:
NIM_PEFT_SOURCE=http://nemo-entity-store:8000
NIM_PEFT_REFRESH_INTERVAL=30
Architecture: Complete model replacement
- Entire model weights replaced with fine-tuned version 
- Requires dedicated deployment resources 
- Higher storage and memory requirements 
- Maximum customization flexibility 
Environment Configuration:
NIM_FT_MODEL=/model-store
NIM_CUSTOM_MODEL=/model-store
Planning Your Training Strategy#
Consider these factors when choosing training configurations:
For Experimentation:
- Choose LoRA with lower parallelism settings 
- Faster iteration cycles 
- Lower resource requirements 
- Easy to compare multiple approaches 
For Production Deployment:
- Consider full SFT for maximum performance 
- Plan for higher deployment resource requirements 
- Factor in model storage and loading times 
- Evaluate whether adapter flexibility is needed 
Note
Deployment Guidance: For detailed information about deploying your fine-tuned models, including manual deployment options outside the NeMo platform, refer to the inference deployment documentation.
Making Configuration Decisions#
Decision Framework#
Use this framework to choose the right configuration for your project:
Example: Choosing a Configuration#
Let’s walk through a realistic example:
Scenario: You want to create an email writing assistant
Step 1: Identify Task Type
- Task: Text generation (email writing) 
- Model family: Language models (Llama, Phi, etc.) 
Step 2: Assess Resources
- Available hardware: 2 A100 GPUs 
- Timeline: Need results within a day 
- Budget: Limited GPU hours 
Step 3: Choose Training Type
- Constraint: Limited resources and time 
- Choice: LoRA training (parameter-efficient) 
Step 4: Find Matching Configuration
# Find LoRA configurations that fit 2 GPUs
suitable_configs = []
for config in configs.data:
    for option in config.training_options:
        if (option.finetuning_type == "lora" and
            option.num_gpus <= 2):
            suitable_configs.append({
                'name': config.name,
                'base_model': config.target.base_model,
                'gpus': option.num_gpus
            })
print("Suitable configurations for your use case:")
for config in suitable_configs:
    print(f"  ✓ {config['name']} ({config['gpus']} GPUs)")
Result: Choose llama-3.2-1b-instruct@v1.0.0+A100 with LoRA training
Getting Help with Configurations#
When Configurations Are Disabled#
If you find a configuration that meets your needs but is disabled:
- Note the exact configuration name (e.g., - llama-3.1-8b-instruct@v1.0.0+A100)
- Contact your cluster administrator with a specific request 
- Provide context about your use case and why you need this configuration 
Example Request Email:
Subject: Enable NeMo Customizer Configuration Request
Hi [Admin Name],
I need access to the following configuration for my project:
Configuration: llama-3.1-8b-instruct@v1.0.0+A100
Use Case: Fine-tuning a customer support chatbot
Training Type: LoRA (low resource requirements)
Timeline: Need to start training this week
This configuration appears to be available but disabled. Could you please enable it for our team?
Thanks!
Administrator Resources#
If you’re an administrator, refer to the configuration management documentation for guidance on:
- Creating new configurations 
- Enabling/disabling configurations 
- Managing hardware resource allocation 
- Setting up configurations for different user groups 
Next Steps#
Now that you understand NeMo Customizer configurations and models, you’re ready to proceed with fine-tuning:
Learn how to prepare your data for the model type you’ve chosen.
Begin with parameter-efficient fine-tuning if you chose a LoRA configuration.
Use full supervised fine-tuning if you chose an all_weights configuration.
Import and fine-tune private Hugging Face models.
Key Takeaways#
✅ Configurations combine model + hardware + training options in pre-built recipes
✅ A100 configurations work on B200 hardware - compatibility is built-in
✅ LoRA requires fewer resources than Full SFT but offers less customization flexibility
✅ Training parallelism settings affect deployment requirements - plan accordingly
✅ LoRA uses adapters (NIM_PEFT_SOURCE) while Full SFT uses complete models (NIM_FT_MODEL)
✅ GPU allocation must satisfy mathematical constraints for training to succeed
✅ Disabled configurations can often be enabled by contacting your administrator
✅ Embedding models are supported for Q&A and retrieval tasks
✅ Reranking models are not currently supported - use embedding models instead
✅ Custom Hugging Face models can be imported with some architectural limitations
Quick Reference Commands#
# List enabled configurations
client.customization.configs.list(filter={"enabled": True})
# List all configurations (including disabled)  
client.customization.configs.list()
# List disabled configurations
client.customization.configs.list(filter={"enabled": False})
# Check resource requirements
for config in configs.data:
    for option in config.training_options:
        print(f"{config.name}: {option.finetuning_type} needs {option.num_gpus} GPUs")
You now have the foundation to make informed decisions about your fine-tuning projects and navigate the NeMo Customizer ecosystem effectively.