Understanding NeMo Customizer: Models, Training, and Resources
Understanding NeMo Customizer: Models, Training, and Resources
Learn the fundamentals of how NeMo Customizer works to make informed decisions about your fine-tuning projects. This tutorial covers how models are organized, how adapters attach to base models, training types and GPU requirements, and how to choose the right approach for your use case.
Understanding these basics will help you navigate the fine-tuning process more effectively and avoid common issues. If you’re ready to start fine-tuning immediately, you can jump to SFT Customization Job after completing this tutorial.
The time to complete this tutorial is approximately 15 minutes.
This tutorial focuses on understanding and discovery—no actual training jobs are created.
Prerequisites
New to using NeMo Platform?
All platform resources—models, datasets, and more—must belong to a workspace. Workspaces provide organizational and authorization boundaries for your work. Within a workspace, you can optionally use projects to group related resources.
If you’re new to the platform, start with the Setup guide to learn how to deploy and evaluate models, and optimize agents using the platform end-to-end.
If you’re already familiar with workspaces and how to upload datasets to the platform, you can proceed directly with this tutorial.
For more information, see Workspaces and Projects.
Platform Setup Requirements and Environment Variables
Before starting, make sure you have:
- NeMo Platform installed and deployed (see Setup)
- The PyPI
nemo-platformwrapper package installed (pip install "nemo-platform[all]"). If you are working from a source checkout, runmake bootstrapfrom the repository root instead. - (Optional) Weights & Biases account and API key for enhanced visualization
Set up environment variables:
Initialize the SDK:
Core Concepts
What is a Model Entity?
A Model Entity represents a model registered in the NeMo Platform. It contains:
- FileSet Reference: Points to the model checkpoint files (weights, config, tokenizer)
- Model Spec: Auto-populated metadata about the model architecture (layers, parameters, etc.)
- Adapters: LoRA or other parameter-efficient fine-tuning weights attached to this model
- Base Model Link: Optional reference to a parent model (for fine-tuned models)
Think of a Model Entity as a “model card” that tracks everything about a model—where its files are, what architecture it uses, and what adapters have been trained for it.
What is an Adapter?
An Adapter is a set of parameter-efficient fine-tuning weights (like LoRA) that are attached to a Model Entity. Adapters:
- Are nested within the parent Model Entity
- Are enabled for inference by default post training
- Have their own FileSet for storing the adapter weights
- Track metadata like finetuning type, rank, and alpha values
What is a FileSet?
A FileSet is a collection of files stored in the platform’s file service. For customization:
- Model FileSet: Contains the base model checkpoint (config, weights, tokenizer)
- Adapter FileSet: Contains the LoRA adapter weights
- Dataset FileSet: Contains training and validation data
The Customization Workflow
Step-by-Step Breakdown
1. Create a FileSet for your base model
Upload your model checkpoint files (from HuggingFace, NGC, or local storage) to a FileSet:
2. Create a Model Entity pointing to the FileSet
3. Create a Customization Job
4. Access the Result
After training completes:
Understanding Adapters and Deployment
How Adapters Work
When you run a LoRA customization job:
- Training produces adapter weights (small compared to base model)
- Adapter created and attached to the parent Model Entity
- FileSet created with the adapter weights
- Enabled by default so NIMs serving the base model automatically load the adapter
Viewing Adapters on a Model
Disabling and Re-enabling Adapters
Adapters are enabled by default, but you can disable an adapter to remove it from inference without deleting it. When you set enabled=False, the sidecar running alongside the NIM automatically removes the adapter’s files on its next reconciliation pass (every few seconds). Re-enabling the adapter causes the sidecar to re-download and serve it again.
To re-enable:
Creating Adapters Manually
You can also create adapters manually (e.g., from externally trained weights):
Training Types and Resource Requirements
Available Training Approaches
GPU Memory Guidelines
Storage Requirements
Customization jobs consume disk space on the platform’s shared persistent volume for model files, finetuning checkpoints, and the final output artifact. Required space depends on the training type:
These estimates cover model weights only and do not include training dataset size.
If the platform disk fills during a job, the job fails with an I/O
error and the job service may return a 500 status when you retrieve logs.
Ensure your platform’s shared persistent volume has at least 3× the base model size of free space before starting a full SFT job, or 1.5× for LoRA jobs.
For troubleshooting disk-related failures, see customizer.
Parallelism Parameters Explained
Parallelism is configured via training.parallelism. These parameters control how training workloads are distributed across GPUs:
data_parallel_size is automatically derived as total_gpus / (TP × PP × CP) and is not set directly.
Recommended parallelism for Experts (MoE) Models:
The expert_parallel_size parameter is used to parallelize a Mixture of Experts (MoE) model’s experts across GPUs. For non-MoE models, this parameter is ignored. A model’s model card will indicate if it is a Mixture of Experts model and specifies its number of experts.
The number of experts in the model must be divisible by expert_parallel_size. For example, if a model has 8 experts, setting expert_parallel_size=4 results in each GPU processing 2 experts.
Also, the value of expert_parallel_size must evenly divide the derived data_parallel_size, which is automatically calculated as data_parallel_size = total GPUs / (tensor_parallel_size × pipeline_parallel_size × context_parallel_size).
For example, with 8 total GPUs, tensor_parallel_size=2, and pipeline_parallel_size=1:
- Derived
data_parallel_size = 8 / (2 × 1 × 1) = 4 - Valid
expert_parallel_sizevalues:1,2, or4(must evenly divide 4) - Invalid
expert_parallel_sizevalue:3(does not evenly divide 4)
Resource Allocation Rules
Training configurations must satisfy mathematical constraints to work properly:
GPU Allocation Rule: The total number of GPUs (num_gpus_per_node x num_nodes) must be a multiple of:
tensor_parallel_size × pipeline_parallel_size × context_parallel_size
If this constraint isn’t met, your training job will fail with a validation error.
Example Calculations:
- 8 GPUs with
tensor_parallel_size=4, pipeline_parallel_size=2✅ Valid (8 = 4 × 2 × 1) - 4 GPUs with
tensor_parallel_size=4, pipeline_parallel_size=2❌ Invalid (4 ≠ 4 × 2 × 1)
Choosing Your Training Approach
Decision Framework
When to Use LoRA
✅ Choose LoRA when:
- You have limited GPU resources (1-2 GPUs)
- You want fast training iterations
- You need multiple specialized versions of the same base model
- You want to auto-deploy adapters to existing NIM deployments
When to Use Full Fine-Tuning
✅ Choose Full SFT when:
- You need maximum model performance
- You have sufficient GPU resources (4+ GPUs)
- You want complete control over all model weights
- You’re preparing a production deployment
Model Types and Capabilities
Supported Language Models
Specialized Models
Importing Custom Models
You can import any HuggingFace-compatible model:
For detailed guidance, see Import HuggingFace Model.
Next Steps
Now that you understand how Model Entities and Adapters work, you’re ready to proceed:
Learn how to prepare your data for fine-tuning.
Create a parameter-efficient LoRA adapter.
Use full supervised fine-tuning for maximum performance.
Import and fine-tune private HuggingFace models.
Key Takeaways
✅ Model Entities contain model metadata and point to FileSet with checkpoint files ✅ Adapters (LoRA) are attached to Model Entities, not stored separately ✅ FileSet is where actual model/adapter files are stored ✅ LoRA training creates an adapter on the parent Model Entity ✅ Full SFT training creates a new Model Entity with full weights ✅ Adapters are enabled by default and automatically loaded by NIMs serving the base model ✅ GPU requirements vary significantly between LoRA and full fine-tuning ✅ Custom HuggingFace models can be imported via FileSet + Model Entity
Quick Reference Commands
You now have the foundation to make informed decisions about your fine-tuning projects!