Understanding NeMo Customizer: Models, Training, and Resources

Learn the fundamentals of how NeMo Customizer works to make informed decisions about your fine-tuning projects. This tutorial covers how models are organized, how adapters attach to base models, training types and GPU requirements, and how to choose the right approach for your use case.

Understanding these basics will help you navigate the fine-tuning process more effectively and avoid common issues. If you’re ready to start fine-tuning immediately, you can jump to SFT Customization Job after completing this tutorial.

The time to complete this tutorial is approximately 15 minutes.

This tutorial focuses on understanding and discovery—no actual training jobs are created.

Prerequisites

New to using NeMo Platform?

All platform resources—models, datasets, and more—must belong to a workspace. Workspaces provide organizational and authorization boundaries for your work. Within a workspace, you can optionally use projects to group related resources.

If you’re new to the platform, start with the Setup guide to learn how to deploy and evaluate models, and optimize agents using the platform end-to-end.

If you’re already familiar with workspaces and how to upload datasets to the platform, you can proceed directly with this tutorial.

For more information, see Workspaces and Projects.

Platform Setup Requirements and Environment Variables

Before starting, make sure you have:

NeMo Platform installed and deployed (see Setup)
The PyPI nemo-platform wrapper package installed (pip install "nemo-platform[all]"). If you are working from a source checkout, run make bootstrap from the repository root instead.
(Optional) Weights & Biases account and API key for enhanced visualization

Set up environment variables:

$ # Set the base URL for NeMo Platform
$ export NMP_BASE_URL="http://localhost:8080" # Or your deployed platform URL
$ 
$ # Optional: Weights & Biases for experiment tracking
$ export WANDB_API_KEY="<your-wandb-api-key>"

Initialize the SDK:

1 import os
2 from nemo_platform import NeMoPlatform
3 
4 client = NeMoPlatform(
5     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
6     workspace="default",
7 )

Core Concepts

What is a Model Entity?

A Model Entity represents a model registered in the NeMo Platform. It contains:

FileSet Reference: Points to the model checkpoint files (weights, config, tokenizer)
Model Spec: Auto-populated metadata about the model architecture (layers, parameters, etc.)
Adapters: LoRA or other parameter-efficient fine-tuning weights attached to this model
Base Model Link: Optional reference to a parent model (for fine-tuned models)

Think of a Model Entity as a “model card” that tracks everything about a model—where its files are, what architecture it uses, and what adapters have been trained for it.

What is an Adapter?

An Adapter is a set of parameter-efficient fine-tuning weights (like LoRA) that are attached to a Model Entity. Adapters:

Are nested within the parent Model Entity
Are enabled for inference by default post training
Have their own FileSet for storing the adapter weights
Track metadata like fine-tuning type, rank, and alpha values

What is a FileSet?

A FileSet is a collection of files stored in the platform’s file service. For customization:

Model FileSet: Contains the base model checkpoint (config, weights, tokenizer)
Adapter FileSet: Contains the LoRA adapter weights
Dataset FileSet: Contains training and validation data

The Customization Workflow

Step-by-Step Breakdown

1. Create a FileSet for your base model

Upload your model checkpoint files (from Hugging Face, NGC, or local storage) to a FileSet:

1 import os
2 from nemo_platform import NeMoPlatform
3 from nemo_platform.types.files import HuggingfaceStorageConfigParam
4 
5 client = NeMoPlatform(
6     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
7     workspace="default",
8 )
9 
10 # Create a FileSet from Hugging Face
11 fileset = client.files.filesets.create(
12     workspace="default",
13     name="llama-3-2-1b",
14     description="Llama 3.2 1B base model",
15     storage=HuggingfaceStorageConfigParam(
16         type="huggingface",
17         repo_id="meta-llama/Llama-3.2-1B-Instruct",
18         repo_type="model",
19         token_secret="my-hf-token",  # Secret containing Hugging Face token
20     ),
21 )

2. Create a Model Entity pointing to the FileSet

1 model = client.models.create(
2     workspace="default",
3     name="llama-3-2-1b",
4     fileset="default/llama-3-2-1b",  # Reference to the FileSet
5     description="Llama 3.2 1B base model",
6 )
7 
8 # Wait for model spec to be auto-populated
9 import time
10 
11 while not model.spec:
12     time.sleep(5)
13     model = client.models.retrieve(workspace="default", name="llama-3-2-1b")
14 
15 print(f"Model architecture: {model.spec.family}")
16 print(f"Parameters: {model.spec.base_num_parameters:,}")

3. Create a Customization Job

1 from nemo_automodel_plugin.schema import AutomodelJobInput
2 
3 spec = AutomodelJobInput(
4     model="default/llama-3-2-1b",
5     dataset={"training": "default/email-training-data"},
6     training={"training_type": "sft", "finetuning_type": "lora", "lora": {"rank": 8, "alpha": 32}},
7     schedule={"epochs": 3},
8     batch={"global_batch_size": 32, "micro_batch_size": 1},
9 )
10 
11 job = client.customization.automodel.jobs.create(
12     workspace="default",
13     name="my-email-assistant-lora",
14     spec=spec,
15 )

4. Access the Result

After training completes:

1 # For LoRA jobs - adapter is attached to the model
2 model = client.models.retrieve(workspace="default", name="llama-3-2-1b")
3 
4 for adapter in model.adapters:
5     print(f"Adapter: {adapter.name}")
6     print(f" Type: {adapter.finetuning_type}")
7     print(f" Enabled: {adapter.enabled}")
8     print(f" Files: {adapter.fileset}")

Understanding Adapters and Deployment

How Adapters Work

When you run a LoRA customization job:

Training produces adapter weights (small compared to base model)
Adapter created and attached to the parent Model Entity
FileSet created with the adapter weights
Enabled by default so NIMs serving the base model automatically load the adapter

Viewing Adapters on a Model

1 model = client.models.retrieve(workspace="default", name="llama-3-2-1b")
2 
3 print(f"Model: {model.name}")
4 print(f"Adapters:")
5 for adapter in model.adapters or []:
6     print(f" - {adapter.name}")
7     print(f" Type: {adapter.finetuning_type}")
8     print(f" Enabled: {adapter.enabled}")
9     print(f" Created: {adapter.created_at}")

Disabling and Re-enabling Adapters

Adapters are enabled by default, but you can disable an adapter to remove it from inference without deleting it. When you set enabled=False, the sidecar running alongside the NIM automatically removes the adapter’s files on its next reconciliation pass (every few seconds). Re-enabling the adapter causes the sidecar to re-download and serve it again.

1 client.models.adapters.update(
2     adapter="my-custom-lora",
3     model_name="llama-3-2-1b",
4     workspace="default",
5     enabled=False,
6 )

To re-enable:

1 client.models.adapters.update(
2     adapter="my-custom-lora",
3     model_name="llama-3-2-1b",
4     workspace="default",
5     enabled=True,
6 )

Creating Adapters Manually

You can also create adapters manually (e.g., from externally trained weights):

1 model = client.models.adapters.create(
2     model_name="llama-3-2-1b",
3     workspace="default",
4     name="my-custom-lora",
5     fileset="default/my-lora-weights",
6     finetuning_type="lora",
7 )

Training Types and Resource Requirements

Available Training Approaches

Approach	Description	GPU Requirements	Output
LoRA (Low-Rank Adaptation)	Trains small adapter weights while keeping base model frozen	1-2 GPUs (80GB each)	Adapter attached to parent Model Entity
Full SFT (Supervised Fine-Tuning)	Updates all model weights for maximum performance	4+ GPUs (80GB each)	New Model Entity with full weights

GPU Memory Guidelines

Model Size	LoRA	Full SFT
1B parameters	1x 80GB GPU	1x 80GB GPU
8B parameters	1x 80GB GPU	4x 80GB GPUs
70B parameters	2x 80GB GPUs	8x 80GB GPUs

Storage Requirements

Customization jobs consume disk space on the platform’s shared persistent volume for model files, fine-tuning checkpoints, and the final output artifact. Required space depends on the training type:

Training Type	Approximate Disk Usage	Notes
LoRA	~1.5× downloaded base checkpoint size	Stores base model + small adapter weights
Full SFT	~3× downloaded base checkpoint size	Stores base model + intermediate checkpoint + full output model

These estimates cover model weights only and do not include training dataset size. If the platform disk fills during a job, the job fails with an I/O error and the job service may return a 500 status when you retrieve logs.

Ensure your platform’s shared persistent volume has at least 3× the downloaded base checkpoint size of free space before starting a full SFT job, or 1.5× for LoRA jobs.

For troubleshooting disk-related failures, see customizer.

Parallelism Parameters Explained

Parallelism is configured via the top-level Automodel parallelism block (for example, parallelism={"num_gpus_per_node": 1}). These parameters control how training workloads are distributed across GPUs:

Parameter	Description	Default
`parallelism.num_nodes`	Number of training nodes	`1`
`parallelism.num_gpus_per_node`	GPUs per node	`1`
`parallelism.tensor_parallel_size`	Number of GPUs to distribute each layer’s parameters across	`1`
`parallelism.pipeline_parallel_size`	Number of GPUs to distribute layers across sequentially	`1`
`parallelism.context_parallel_size`	Number of GPUs to distribute sequence context across	`1`
`parallelism.sequence_parallel`	Enable sequence parallelism to distribute activation memory along the sequence dimension	`false`
`parallelism.expert_parallel_size`	Number of GPUs to distribute MoE experts across (MoE models only)	`null`

data_parallel_size is automatically derived as total_gpus / (TP × PP × CP) and is not set directly.

Recommended parallelism for Experts (MoE) Models:

The parallelism.expert_parallel_size parameter parallelizes a Mixture of Experts (MoE) model’s experts across GPUs. For non-MoE models, leave it unset (null). A model’s model card indicates whether it is a Mixture of Experts model and how many experts it has.

When you set expert_parallel_size:

The number of experts in the model must be divisible by expert_parallel_size. For example, if a model has 8 experts, expert_parallel_size=4 gives each GPU 2 experts.
(data_parallel_size × context_parallel_size) must be divisible by expert_parallel_size.
When expert_parallel_size > 1, tensor_parallel_size must be 1.

For example, with 8 total GPUs, tensor_parallel_size=1, pipeline_parallel_size=1, and context_parallel_size=1:

Derived data_parallel_size = 8 / (1 × 1 × 1) = 8
data_parallel_size × context_parallel_size = 8
Valid expert_parallel_size values: 1, 2, 4, or 8
Invalid expert_parallel_size value: 3 (does not divide 8)

Resource Allocation Rules

Training configurations must satisfy mathematical constraints to work properly:

GPU Allocation Rule: The total number of GPUs (num_gpus_per_node x num_nodes) must be a multiple of: tensor_parallel_size × pipeline_parallel_size × context_parallel_size

If this constraint isn’t met, your training job will fail with a validation error.

Example Calculations:

8 GPUs with tensor_parallel_size=4, pipeline_parallel_size=2 ✅ Valid (8 = 4 × 2 × 1)
4 GPUs with tensor_parallel_size=4, pipeline_parallel_size=2 ❌ Invalid (4 ≠ 4 × 2 × 1)

Choosing Your Training Approach

Decision Framework

When to Use LoRA

✅ Choose LoRA when:

You have limited GPU resources (1-2 GPUs)
You want fast training iterations
You need multiple specialized versions of the same base model
You want to auto-deploy adapters to existing NIM deployments

When to Use Full Fine-Tuning

✅ Choose Full SFT when:

You need maximum model performance
You have sufficient GPU resources (4+ GPUs)
You want complete control over all model weights
You’re preparing a production deployment

Model Types and Capabilities

Supported Language Models

Model Family	Description	Examples
Llama Models	General-purpose language models excellent for instruction following, conversation, and text generation tasks	`llama-3.1-8b-instruct`, `llama-3.2-1b-instruct`
Llama Nemotron Models	NVIDIA’s specialized variants optimized for specific use cases with enhanced reasoning capabilities	Various Nano and Super variants
Phi Models	Microsoft’s efficient models designed for strong reasoning with optimized deployment characteristics	Phi model family configurations
GPT-OSS Models	Open-weight reasoning models with tested Full SFT and LoRA configurations	`openai/gpt-oss-20b`

Specialized Models

Model Type	Status	Details
Embedding Models	✅ Supported	Model: Llama 3.2 NV EmbedQA 1B for question-answering and retrieval tasks Use Cases: Semantic search, document retrieval, question-answering systems, RAG pipelines Note: Typically disabled by default—contact your administrator for access
Reranking Models	❌ Not Supported	Alternative: Use embedding models for retrieval tasks, or implement reranking in your application layer

Importing Custom Models

You can import a Hugging Face checkpoint into a FileSet and register it as a Model Entity:

1 from nemo_platform.types.files import HuggingfaceStorageConfigParam
2 
3 # Create FileSet from Hugging Face
4 fileset = client.files.filesets.create(
5     workspace="default",
6     name="my-custom-model",
7     storage=HuggingfaceStorageConfigParam(
8         type="huggingface",
9         repo_id="organization/model-name",
10         repo_type="model",
11         token_secret="my-hf-token",
12     ),
13 )
14 
15 # Create Model Entity
16 model = client.models.create(
17     workspace="default", name="my-custom-model", fileset="default/my-custom-model"
18 )

For detailed guidance, see Import Hugging Face Model.

Importing a checkpoint does not guarantee that every training or deployment backend supports its architecture. In particular, Automodel LoRA does not support Conv1D-based architectures such as older GPT-2 variants. Confirm the model and fine-tuning regime in the Tested Models table, and review the import tutorial’s known architecture limitations before submitting a job.

Next Steps

Now that you understand how Model Entities and Adapters work, you’re ready to proceed:

Format Training Dataset

Learn how to prepare your data for fine-tuning.

Start a LoRA Job

Create a parameter-efficient LoRA adapter.

Start a Full SFT Job

Use full supervised fine-tuning for maximum performance.

Import Custom Models

Import and fine-tune private Hugging Face models.

Key Takeaways

✅ Model Entities contain model metadata and point to FileSet with checkpoint files ✅ Adapters (LoRA) are attached to Model Entities, not stored separately ✅ FileSet is where actual model/adapter files are stored ✅ LoRA training creates an adapter on the parent Model Entity ✅ Full SFT training creates a new Model Entity with full weights ✅ Adapters are enabled by default and automatically loaded by NIMs serving the base model ✅ GPU requirements vary significantly between LoRA and full fine-tuning ✅ Custom Hugging Face models can be imported via FileSet + Model Entity

Quick Reference Commands

1 # List all Model Entities
2 models = client.models.list(workspace="default")
3 
4 # Get a specific Model Entity with adapters
5 model = client.models.retrieve(workspace="default", name="llama-3-2-1b")
6 
7 # Create an Automodel training job
8 from nemo_automodel_plugin.schema import AutomodelJobInput
9 
10 spec = AutomodelJobInput(
11     model="default/llama-3-2-1b",
12     dataset={"training": "default/my-dataset"},
13     training={"training_type": "sft", "finetuning_type": "lora"},
14 )
15 
16 job = client.customization.automodel.jobs.create(
17     workspace="default",
18     name="my-job",
19     spec=spec,
20 )
21 
22 # Add an adapter to a model
23 client.models.adapters.create(
24     model_name="llama-3-2-1b",
25     workspace="default",
26     name="my-adapter",
27     fileset="default/adapter-weights",
28     finetuning_type="lora",
29 )

You now have the foundation to make informed decisions about your fine-tuning projects!