> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo-platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo-platform/_mcp/server.

# Understanding NeMo Customizer: Models, Training, and Resources

<a id="ft-tut-understand-models" />

Learn the fundamentals of how NeMo Customizer works to make informed decisions about your fine-tuning projects. This tutorial covers how models are organized, how adapters attach to base models, training types and GPU requirements, and how to choose the right approach for your use case.

Understanding these basics will help you navigate the fine-tuning process more effectively and avoid common issues. If you're ready to start fine-tuning immediately, you can jump to [SFT Customization Job](sft-customization-job.ipynb) after completing this tutorial.

The time to complete this tutorial is approximately 15 minutes.

This tutorial focuses on understanding and discovery—no actual training jobs are created.

## Prerequisites

All platform resources—models, datasets, and more—must belong to a **workspace**. Workspaces provide organizational and authorization boundaries for your work. Within a workspace, you can optionally use **projects** to group related resources.

**If you're new to the platform**, start with the **[Setup guide](/documentation/get-started)** to learn how to deploy and evaluate models, and optimize agents using the platform end-to-end.

**If you're already familiar** with workspaces and how to upload datasets to the platform, you can proceed directly with this tutorial.

For more information, see [Workspaces](/documentation/get-started/core-concepts/workspaces) and [Projects](/documentation/get-started/core-concepts/projects).

Before starting, make sure you have:

* NeMo Platform installed and deployed (see [Setup](/documentation/get-started))
* The PyPI `nemo-platform` wrapper package installed (`pip install "nemo-platform[all]"`). If you are working from a source checkout, run `make bootstrap` from the repository root instead.
* (Optional) Weights & Biases account and API key for enhanced visualization

**Set up environment variables:**

```bash
# Set the base URL for NeMo Platform
export NMP_BASE_URL="http://localhost:8080" # Or your deployed platform URL

# Optional: Weights & Biases for experiment tracking
export WANDB_API_KEY="<your-wandb-api-key>"
```

**Initialize the SDK:**

```python
import os
from nemo_platform import NeMoPlatform

client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)
```

***

## Core Concepts

### What is a Model Entity?

A **Model Entity** represents a model registered in the NeMo Platform. It contains:

1. **FileSet Reference**: Points to the model checkpoint files (weights, config, tokenizer)
2. **Model Spec**: Auto-populated metadata about the model architecture (layers, parameters, etc.)
3. **Adapters**: LoRA or other parameter-efficient fine-tuning weights attached to this model
4. **Base Model Link**: Optional reference to a parent model (for fine-tuned models)

Think of a Model Entity as a "model card" that tracks everything about a model—where its files are, what architecture it uses, and what adapters have been trained for it.

### What is an Adapter?

An **Adapter** is a set of parameter-efficient fine-tuning weights (like LoRA) that are attached to a Model Entity. Adapters:

* Are **nested within** the parent Model Entity
* Are **enabled** for inference by default post training
* Have their own FileSet for storing the adapter weights
* Track metadata like finetuning type, rank, and alpha values

### What is a FileSet?

A **FileSet** is a collection of files stored in the platform's file service. For customization:

* **Model FileSet**: Contains the base model checkpoint (config, weights, tokenizer)
* **Adapter FileSet**: Contains the LoRA adapter weights
* **Dataset FileSet**: Contains training and validation data

***

## The Customization Workflow

```mermaid

flowchart LR
 A[1. Create FileSet<br />with model files] --> B[2. Create Model Entity<br />pointing to FileSet]
 B --> C[3. Create Customization Job<br />referencing Model Entity]
 C --> D{Training Type?}
 D -->{{ LoRA }} E[Adapter created and<br />attached to Model Entity]
 D -->|Full SFT| F[New Model Entity<br />with customized weights]
 E --> G[Auto-deploy to NIM<br />if enabled]

```

### Step-by-Step Breakdown

**1. Create a FileSet for your base model**

Upload your model checkpoint files (from HuggingFace, NGC, or local storage) to a FileSet:

```python
import os
from nemo_platform import NeMoPlatform
from nemo_platform.types.files import HuggingfaceStorageConfigParam

client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)

# Create a FileSet from HuggingFace
fileset = client.files.filesets.create(
    workspace="default",
    name="llama-3-2-1b",
    description="Llama 3.2 1B base model",
    storage=HuggingfaceStorageConfigParam(
        type="huggingface",
        repo_id="meta-llama/Llama-3.2-1B-Instruct",
        repo_type="model",
        token_secret="my-hf-token",  # Secret containing HuggingFace token
    ),
)
```

**2. Create a Model Entity pointing to the FileSet**

```python
model = client.models.create(
    workspace="default",
    name="llama-3-2-1b",
    fileset="default/llama-3-2-1b",  # Reference to the FileSet
    description="Llama 3.2 1B base model",
)

# Wait for model spec to be auto-populated
import time

while not model.spec:
    time.sleep(5)
    model = client.models.retrieve(workspace="default", name="llama-3-2-1b")

print(f"Model architecture: {model.spec.family}")
print(f"Parameters: {model.spec.base_num_parameters:,}")
```

**3. Create a Customization Job**

```python
job = client.customization.jobs.create(
    workspace="default",
    name="my-email-assistant-lora",
    spec={
        "model": "default/llama-3-2-1b",
        "dataset": "fileset://default/email-training-data",
        "training": {
            "type": "sft",
            "peft": {"type": "lora", "rank": 8, "alpha": 32},
            "epochs": 3,
            "batch_size": 32,
        },
        "deployment_config": {"lora_enabled": True},
    },
)
```

**4. Access the Result**

After training completes:

```python
# For LoRA jobs - adapter is attached to the model
model = client.models.retrieve(workspace="default", name="llama-3-2-1b")

for adapter in model.adapters:
    print(f"Adapter: {adapter.name}")
    print(f" Type: {adapter.finetuning_type}")
    print(f" Enabled: {adapter.enabled}")
    print(f" Files: {adapter.fileset}")
```

***

## Understanding Adapters and Deployment

### How Adapters Work

When you run a LoRA customization job:

1. **Training** produces adapter weights (small compared to base model)
2. **Adapter created** and attached to the parent Model Entity
3. **FileSet created** with the adapter weights
4. **Enabled by default** so NIMs serving the base model automatically load the adapter

### Viewing Adapters on a Model

```python
model = client.models.retrieve(workspace="default", name="llama-3-2-1b")

print(f"Model: {model.name}")
print(f"Adapters:")
for adapter in model.adapters or []:
    print(f" - {adapter.name}")
    print(f" Type: {adapter.finetuning_type}")
    print(f" Enabled: {adapter.enabled}")
    print(f" Created: {adapter.created_at}")
```

### Disabling and Re-enabling Adapters

Adapters are enabled by default, but you can disable an adapter to remove it from inference without deleting it. When you set `enabled=False`, the sidecar running alongside the NIM automatically removes the adapter's files on its next reconciliation pass (every few seconds). Re-enabling the adapter causes the sidecar to re-download and serve it again.

```python
client.models.update_adapter(
    model_name="llama-3-2-1b",
    workspace="default",
    adapter_name="my-custom-lora",
    enabled=False,
)
```

To re-enable:

```python
client.models.update_adapter(
    model_name="llama-3-2-1b",
    workspace="default",
    adapter_name="my-custom-lora",
    enabled=True,
)
```

### Creating Adapters Manually

You can also create adapters manually (e.g., from externally trained weights):

```python
model = client.models.create_adapter(
    model_name="llama-3-2-1b",
    workspace="default",
    name="my-custom-lora",
    fileset="default/my-lora-weights",
    finetuning_type="lora",
)
```

***

## Training Types and Resource Requirements

### Available Training Approaches

| Approach                          | Description                                                  | GPU Requirements     | Output                                  |
| --------------------------------- | ------------------------------------------------------------ | -------------------- | --------------------------------------- |
| LoRA (Low-Rank Adaptation)        | Trains small adapter weights while keeping base model frozen | 1-2 GPUs (80GB each) | Adapter attached to parent Model Entity |
| Full SFT (Supervised Fine-Tuning) | Updates all model weights for maximum performance            | 4+ GPUs (80GB each)  | New Model Entity with full weights      |

### GPU Memory Guidelines

| Model Size     | LoRA         | Full SFT     |
| -------------- | ------------ | ------------ |
| 1B parameters  | 1x 80GB GPU  | 1x 80GB GPU  |
| 8B parameters  | 1x 80GB GPU  | 4x 80GB GPUs |
| 70B parameters | 2x 80GB GPUs | 8x 80GB GPUs |

### Storage Requirements

Customization jobs consume disk space on the platform's shared persistent volume for model files, finetuning checkpoints, and the final output artifact. Required space depends on the training type:

| Training Type | Approximate Disk Usage | Notes                                              |
| ------------- | ---------------------- | -------------------------------------------------- |
| LoRA          | \~1.5× base model size | Stores base model + small adapter weights          |
| Full SFT      | \~3× base model size   | Stores base model + full checkpoint + output model |

These estimates cover model weights only and do not include training dataset size.
If the platform disk fills during a job, the job fails with an I/O
error and the job service may return a `500` status when you retrieve logs.

Ensure your platform's shared persistent volume has at least **3× the base model size**
of free space before starting a full SFT job, or **1.5×** for LoRA jobs.

For troubleshooting disk-related failures, see [customizer](/documentation/reference/troubleshooting/customizer).

### Parallelism Parameters Explained

Parallelism is configured via `training.parallelism`. These parameters control how training workloads are distributed across GPUs:

| Parameter                | Description                                                                              | Default |
| ------------------------ | ---------------------------------------------------------------------------------------- | ------- |
| `tensor_parallel_size`   | Number of GPUs to distribute each layer's parameters across                              | 1       |
| `pipeline_parallel_size` | Number of GPUs to distribute layers across sequentially                                  | 1       |
| `context_parallel_size`  | Number of GPUs to distribute sequence context across                                     | 1       |
| `sequence_parallel`      | Enable sequence parallelism to distribute activation memory along the sequence dimension | `false` |
| `expert_parallel_size`   | Number of GPUs to distribute MoE experts across (MoE models only)                        | 1       |

`data_parallel_size` is automatically derived as `total_gpus / (TP × PP × CP)` and is not set directly.

**Recommended parallelism for Experts (MoE) Models**:

The `expert_parallel_size` parameter is used to parallelize a Mixture of Experts (MoE) model's experts across GPUs. For non-MoE models, this parameter is ignored. A model's model card will indicate if it is a Mixture of Experts model and specifies its number of experts.

The number of experts in the model must be divisible by `expert_parallel_size`. For example, if a model has 8 experts, setting `expert_parallel_size=4` results in each GPU processing 2 experts.

Also, the value of `expert_parallel_size` must evenly divide the derived `data_parallel_size`, which is automatically calculated as `data_parallel_size = total GPUs / (tensor_parallel_size × pipeline_parallel_size × context_parallel_size)`.

For example, with 8 total GPUs, `tensor_parallel_size=2`, and `pipeline_parallel_size=1`:

* Derived `data_parallel_size = 8 / (2 × 1 × 1) = 4`
* Valid `expert_parallel_size` values: `1`, `2`, or `4` (must evenly divide 4)
* Invalid `expert_parallel_size` value: `3` (does not evenly divide 4)

### Resource Allocation Rules

Training configurations must satisfy mathematical constraints to work properly:

**GPU Allocation Rule**: The total number of GPUs (`num_gpus_per_node x num_nodes`) must be a multiple of:
`tensor_parallel_size × pipeline_parallel_size × context_parallel_size`

If this constraint isn't met, your training job will fail with a validation error.

**Example Calculations**:

* 8 GPUs with `tensor_parallel_size=4, pipeline_parallel_size=2` ✅ Valid (8 = 4 × 2 × 1)
* 4 GPUs with `tensor_parallel_size=4, pipeline_parallel_size=2` ❌ Invalid (4 ≠ 4 × 2 × 1)

***

## Choosing Your Training Approach

### Decision Framework

```mermaid

flowchart TD
 A[What's your goal?] --> B{Need maximum<br />performance?}
 B -->{{ Yes }} C{Have 4+ GPUs?}
 B -->{{ No }} D[Choose LoRA]
 
 C -->{{ Yes }} E[Choose Full SFT]
 C -->{{ No }} D
 
 D --> F{Multiple<br />use cases?}
 F -->{{ Yes }} G[Train multiple<br />LoRA adapters]
 F -->{{ No }} H[Single LoRA<br />adapter]
 
 E --> I[New Model Entity<br />with full weights]
 G --> J[Multiple adapters<br />on same Model Entity]
 H --> J

```

### When to Use LoRA

✅ **Choose LoRA when:**

* You have limited GPU resources (1-2 GPUs)
* You want fast training iterations
* You need multiple specialized versions of the same base model
* You want to auto-deploy adapters to existing NIM deployments

### When to Use Full Fine-Tuning

✅ **Choose Full SFT when:**

* You need maximum model performance
* You have sufficient GPU resources (4+ GPUs)
* You want complete control over all model weights
* You're preparing a production deployment

***

## Model Types and Capabilities

### Supported Language Models

| Model Family              | Description                                                                                                  | Examples                                         |
| ------------------------- | ------------------------------------------------------------------------------------------------------------ | ------------------------------------------------ |
| **Llama Models**          | General-purpose language models excellent for instruction following, conversation, and text generation tasks | `llama-3.1-8b-instruct`, `llama-3.2-1b-instruct` |
| **Llama Nemotron Models** | NVIDIA's specialized variants optimized for specific use cases with enhanced reasoning capabilities          | Various Nano and Super variants                  |
| **Phi Models**            | Microsoft's efficient models designed for strong reasoning with optimized deployment characteristics         | Phi model family configurations                  |
| **GPT-OSS Models**        | Open-source GPT-based models supporting Full SFT customization workflows                                     | Various GPT-OSS configurations                   |

### Specialized Models

| Model Type           | Status          | Details                                                                                                                                                                                                                                                   |
| -------------------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Embedding Models** | ✅ Supported     | **Model**: Llama 3.2 NV EmbedQA 1B for question-answering and retrieval tasks **Use Cases**: Semantic search, document retrieval, question-answering systems, RAG pipelines **Note**: Typically disabled by default—contact your administrator for access |
| **Reranking Models** | ❌ Not Supported | **Alternative**: Use embedding models for retrieval tasks, or implement reranking in your application layer                                                                                                                                               |

### Importing Custom Models

You can import any HuggingFace-compatible model:

```python
from nemo_platform.types.files import HuggingfaceStorageConfigParam

# Create FileSet from HuggingFace
fileset = client.files.filesets.create(
    workspace="default",
    name="my-custom-model",
    storage=HuggingfaceStorageConfigParam(
        type="huggingface",
        repo_id="organization/model-name",
        repo_type="model",
        token_secret="my-hf-token",
    ),
)

# Create Model Entity
model = client.models.create(
    workspace="default", name="my-custom-model", fileset="default/my-custom-model"
)
```

For detailed guidance, see [Import HuggingFace Model](/documentation/customizer-reference/tutorials/import-hugging-face-models).

***

## Next Steps

Now that you understand how Model Entities and Adapters work, you're ready to proceed:

Learn how to prepare your data for fine-tuning.

Create a parameter-efficient LoRA adapter.

Use full supervised fine-tuning for maximum performance.

Import and fine-tune private HuggingFace models.

***

## Key Takeaways

✅ **Model Entities** contain model metadata and point to FileSet with checkpoint files
✅ **Adapters** (LoRA) are attached to Model Entities, not stored separately
✅ **FileSet** is where actual model/adapter files are stored
✅ **LoRA training** creates an adapter on the parent Model Entity
✅ **Full SFT training** creates a new Model Entity with full weights
✅ **Adapters are enabled by default** and automatically loaded by NIMs serving the base model
✅ **GPU requirements** vary significantly between LoRA and full fine-tuning
✅ **Custom HuggingFace models** can be imported via FileSet + Model Entity

### Quick Reference Commands

```python
# List all Model Entities
models = client.models.list(workspace="default")

# Get a specific Model Entity with adapters
model = client.models.retrieve(workspace="default", name="llama-3-2-1b")

# Create a customization job
job = client.customization.jobs.create(
    workspace="default",
    name="my-job",
    spec={
        "model": "default/llama-3-2-1b",
        "dataset": "fileset://default/my-dataset",
        "training": {"type": "sft", "peft": {"type": "lora"}},
    },
)

# Add an adapter to a model
client.models.create_adapter(
    model_name="llama-3-2-1b",
    workspace="default",
    name="my-adapter",
    fileset="default/adapter-weights",
    finetuning_type="lora",
)
```

You now have the foundation to make informed decisions about your fine-tuning projects!