Start a Full SFT Customization Job#

Learn how to use the NeMo Microservices Platform to create a supervised fine-tuning (SFT) job using a custom dataset.

Full SFT modifies all model weights during training, providing the most fine-tuning flexibility but requiring more computational resources. If you want a more resource-efficient approach, consider the Start a LoRA Model Customization Job tutorial instead.

About SFT Customization Jobs#

Full Supervised Fine-Tuning (SFT) shares similarities with LoRA training workflows but differs in two key ways:

Training Impact: While LoRA only updates a small set of parameters and keeps most model weights frozen, full SFT modifies all weights and layers during training.
Deployment Requirements: Full SFT models need a new NVIDIA Inference Microservice (NIM) deployment, unlike LoRA adapters that can be served from an existing NIM. Due to these differences, full SFT typically requires more computational resources and training time than LoRA-based approaches.

Prerequisites#

Platform Prerequisites#

NeMo Customizer Prerequisites#

Tutorial-Specific Prerequisites#

Access to the Deployment Management Service for model deployment

Note

For Full SFT on OpenAI GPT-OSS models, refer to the GPT-OSS model page for model details and support notes.

Select Model#

Find Available Configs#

First, we need to identify what model customization configurations are available for you to use. This will describe the models and corresponding techniques you can choose. SFT customization jobs require a model that support the finetuning_type of all_weights.

Note

GPU requirements are typically higher for all_weights than with PEFT techniques like LoRA.

Get all customization configurations.

Python SDK

from nemo_microservices import NeMoMicroservices
import os

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['CUSTOMIZER_BASE_URL']
)

# Find SFT configurations that support all_weights
configs = client.customization.configs.list(
    filter={"finetuning_type": "all_weights"}
)

print(f"Found {len(configs.data)} SFT configurations")
for config in configs.data:
    print(f"Config: {config.name}")
    print(f"  Training options: {len(config.training_options)}")
    for option in config.training_options:
        print(f"    - {option.training_type}/{option.finetuning_type}: {option.num_gpus} GPUs")
        if option.finetuning_type == "all_weights":
            print(f"      ✓ Supports full SFT training")

cURL

curl -X GET "${CUSTOMIZER_BASE_URL}/v1/customization/configs?filter%5Bfinetuning_type%5D=all_weights" \
  -H 'Accept: application/json' | jq

Review the response to find a model that meets your requirements.

Example Response

{
  "object": "list",
  "data": [
    {
      "name": "meta/llama-3.2-1b-instruct@v1.0.0+A100",
      "namespace": "default",
      "dataset_schemas": [
        {
          "title": "Newline-Delimited JSON File",
          "type": "array",
          "items": {
            "description": "Schema for Supervised Fine-Tuning (SFT) training data items.",
            "properties": {
              "prompt": {
                "description": "The prompt for the entry",
                "title": "Prompt",
                "type": "string"
              },
              "completion": {
                "description": "The completion to train on",
                "title": "Completion",
                "type": "string"
              }
            },
            "required": ["prompt", "completion"],
            "title": "SFTDatasetItemSchema",
            "type": "object"
          }
        }
      ],
      "training_options": [
        {
          "training_type": "sft",
          "finetuning_type": "lora",
          "num_gpus": 1,
          "num_nodes": 1,
          "tensor_parallel_size": 1,
          "use_sequence_parallel": false
        },
        {
          "training_type": "sft",
          "finetuning_type": "all_weights",
          "num_gpus": 1,
          "num_nodes": 1,
          "tensor_parallel_size": 1,
          "use_sequence_parallel": false
        }
      ]
    },
    {
      "name": "nvidia/llama-3.2-nv-embedqa-1b@v2+A100",
      "namespace": "nvidia",
      "dataset_schemas": [
        {
          "title": "Newline-Delimited JSON File",
          "type": "array",
          "items": {
            "description": "Schema for embedding training data items.",
            "properties": {
              "query": {
                "description": "The query to use as an anchor",
                "title": "Query",
                "type": "string"
              },
              "pos_doc": {
                "description": "A document that should match positively with the anchor",
                "title": "Positive Document",
                "type": "string"
              },
              "neg_doc": {
                "description": "Documents that should not match with the anchor",
                "title": "Negative Documents",
                "type": "array",
                "items": {"type": "string"}
              }
            },
            "required": ["query", "pos_doc", "neg_doc"],
            "title": "EmbeddingDatasetItemSchema",
            "type": "object"
          }
        }
      ],
      "training_options": [
        {
          "training_type": "sft",
          "finetuning_type": "lora_merged",
          "num_gpus": 1,
          "num_nodes": 1,
          "tensor_parallel_size": 1,
          "use_sequence_parallel": false
        }
      ]
    }
  ]
}

The response shows that Llama 3.2 1b Instruct is available for full SFT, and will require 1 GPU to train.

Review Dataset Schema#

You can examine the dataset_schema field in the response to understand what data format your model requires.

The schema outlines the specific fields and data types your dataset needs to include, formatted as New Line Delimited JSON (NDJSON). In the next section, we’ll walk through an example to help you understand the schema structure.

Create Datasets#

Now that we know the required shape of the dataset expected by the model configuration, we can prepare our training and validation files and upload them to the dataset.

Prepare Files#

Create two files, train.jsonl and validation.jsonl.
Populate the files with data that is compatible with the meta/llama-3.2-1b-instruct@v1.0.0+A100 model.

Note

Each record should be on a single line in your .jsonl file, with no line breaks within the JSON objects.

{"prompt": "You are an email writting assistant, please help people write cogent emails.\n\nTell John I finally got the PS5 with 1TB from Target and will be home at 8 pm so we can play Call of Duty. State Jason told me it was fun.", "completion": "Dear John, I got it! I managed to find the new PS5 at Target. I will be home by 8pm so we can play Call of Duty on it. It actually is a lot of fun (or so has Jason told me). It is the 1tb one so I hope it serves us well. See you later.  Jackson Lee"}
{"prompt": "You are an email writting assistant, please help people write cogent emails.\n\nCheck in with the horseback riding facility if my daughter can go for a trial ride to see if she likes it. Ask if the horses are gentle and what the capacity is for each lesson.","completion": "Hello! My 9-year-old is interested in horseback riding lessons. She's only ridden ponies at the beach, but had taken up a kean interest in horses lately. How many children are in each lesson group? Are the horses gentle? Would she be able to try a lesson before committing to a full six-month contract?   Thank you for replying at your earliest convenience.  Heather"}

Upload Training Data#

Initialize Client and Configuration#

You need to upload the training files to the training path in NeMo Data Store, and validation files to the validation path. You can have multiple files in each path and they will all be used.

Note

NeMo Customizer expects training files to be in the training folder and validation files to be in the validation folder. Make sure that you upload the files to the right directory.

To set up the clients, you’ll need these configuration values:

Host URL for the entity store service
Host URL for the data storage service
A namespace to organize your resources
Name of your dataset

Python SDK

from nemo_microservices import NeMoMicroservices
from huggingface_hub import HfApi
import os
import requests

# Configuration
ENTITY_HOST = os.environ.get('ENTITY_HOST')  # Replace with the public url of your Entity Store
DS_HOST = os.environ.get('DS_HOST')  # Replace with the public url of your Datastore
NAMESPACE = os.environ.get('NAMESPACE', 'default')
DATASET_NAME = os.environ.get('DATASET_NAME', 'test-dataset')  # dataset name needs to be unique for the namespace

# Initialize NeMo Microservices client for entity operations
entity_client = NeMoMicroservices(
    base_url=ENTITY_HOST
)

# Initialize Hugging Face API client for file operations
hf_api = HfApi(endpoint=f"{DS_HOST}/v1/hf", token="")

cURL

# Set up environment variables
export CUSTOMIZER_BASE_URL="<your-customizer-service-url>"
export ENTITY_HOST="<your-entity-store-url>"
export DS_HOST="<your-datastore-url>"
export NAMESPACE="default"
export DATASET_NAME="test-dataset"

# Hugging Face environment variables (for dataset/model file management)
export HF_ENDPOINT="${DS_HOST}/v1/hf"
export HF_TOKEN="dummy-unused-value"  # Or your actual HF token

# Optional monitoring
export WANDB_API_KEY="<your-wandb-api-key>"

Create Namespaces#

Set the namespace we defined in our configuration values in both the NeMo Entity Store and the NeMo Data Store so that they match.

Python SDK

def create_namespaces(entity_client, ds_host, namespace):
    # Create namespace in entity store using SDK
    try:
        entity_client.namespaces.create(
            id=namespace,
            description=f"Namespace for {namespace} resources"
        )
        print(f"Created namespace {namespace} in Entity Store")
    except Exception as e:
        print(f"Namespace {namespace} may already exist in Entity Store: {e}")

    # Create namespace in datastore using requests
    nds_url = f"{ds_host}/v1/datastore/namespaces"
    resp = requests.post(nds_url, data={"namespace": namespace})
    if resp.status_code in (200, 201):
        print(f"Created namespace {namespace} in Datastore")
    elif resp.status_code in (409, 422):
        print(f"Namespace {namespace} already exists in Datastore")
    else:
        print(f"Failed to create namespace in Datastore: {resp.status_code}")

create_namespaces(entity_client, DS_HOST, NAMESPACE)

cURL

# Create namespace in entity store
curl -X POST "${ENTITY_HOST}/v1/namespaces" \
  -H "Content-Type: application/json" \
  -d '{"id": "'${NAMESPACE}'", "description": "Namespace for '${NAMESPACE}' resources"}'

# Create namespace in datastore
curl -X POST "${DS_HOST}/v1/datastore/namespaces" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "namespace=${NAMESPACE}"

Set Up Dataset Repository#

Create a dataset repository in NeMo Data Store.

Python SDK

# Complete dataset upload workflow
def complete_dataset_upload(entity_host, ds_host, namespace, dataset_name, 
                           training_file="train.jsonl", validation_file="validation.jsonl",
                           description="Training dataset"):
    """Complete workflow for uploading a dataset to NeMo Data Store"""
    
    # Initialize clients
    entity_client = NeMoMicroservices(base_url=entity_host)
    hf_api = HfApi(endpoint=f"{ds_host}/v1/hf", token="")
    
    # Create namespaces
    create_namespaces(entity_client, ds_host, namespace)
    
    # Set up dataset repository
    repo_id = setup_dataset_repo(hf_api, entity_client, namespace, dataset_name, description)
    
    # Upload files
    upload_dataset_files(hf_api, repo_id, training_file, validation_file)
    
    print(f"Dataset upload complete: {repo_id}")
    return repo_id

# Usage example (commented out - uncomment to use):
# repo_id = complete_dataset_upload(
#     entity_host=ENTITY_HOST,
#     ds_host=DS_HOST,
#     namespace=NAMESPACE,
#     dataset_name=DATASET_NAME,
#     training_file="train.jsonl",
#     validation_file="validation.jsonl"
# )

Checkpoint

At this point, we’ve uploaded our training and validation files to the dataset and are ready to define the details of our customization job.

Start Model Customization Job#

Important

The config field must include a version, for example: meta/llama-3.2-1b-instruct@v1.0.0+A100. Omitting the version will result in an error like:

{ "detail": "Version is not specified in the config URN: meta/llama-3.2-1b-instruct" }

You can find the correct config URN (with version) by inspecting the output of the /customization/configs endpoint. Use the name and version fields to construct the URN as name@version.

Example curl:

curl -X GET "${CUSTOMIZER_BASE_URL}/v1/customization/configs?page_size=1000" -H 'Accept: application/json' | jq '.data[] | "\(.namespace)/\(.name)"'

Set Hyperparameters#

While model customization configurations come with default settings, you can customize your training by specifying additional hyperparameters in the hyperparameters field of your customization job.

To train with full SFT, we must:

Set the training_type to sft (Supervised Fine-Tuning).
Set the finetuning_type to all_weights.

Example configuration:

{
  "hyperparameters": {
    "training_type": "sft",
    "finetuning_type": "all_weights",
    "epochs": 3,
    "batch_size": 4,
    "learning_rate": 0.00005,
    "max_seq_length": 2048
  }
}

Note

For more information on hyperparameter options, review the Hyperparameter Options reference.

Create and Submit Training Job#

Use the following command to start a full SFT training job. Replace meta/llama-3.2-1b-instruct@v1.0.0+A100 with your chosen model configuration including the version and test-dataset with your dataset name.

Create a job using the model configuration (config), dataset, and hyperparameters we defined in the previous sections.

Python SDK

from nemo_microservices import NeMoMicroservices
import os

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['CUSTOMIZER_BASE_URL']
)

# Set up WandB API key for enhanced visualization
extra_headers = {}
if os.getenv('WANDB_API_KEY'):
    extra_headers['wandb-api-key'] = os.getenv('WANDB_API_KEY')

# Create a full SFT customization job
job = client.customization.jobs.create(
    config="meta/llama-3.2-1b-instruct@v1.0.0+A100",
    dataset={
        "name": "test-dataset",
        "namespace": "default"
    },
    hyperparameters={
        "training_type": "sft",
        "finetuning_type": "all_weights",
        "epochs": 3,
        "batch_size": 4,
        "learning_rate": 0.00005,
        "max_seq_length": 2048
    },
    extra_headers=extra_headers
)

print(f"Created job with ID: {job.id}")
print(f"Job status: {job.status}")
print(f"Output model: {job.output_model}")

cURL

curl -X "POST" \
  "${CUSTOMIZER_BASE_URL}/v1/customization/jobs" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H "wandb-api-key: ${WANDB_API_KEY}" \
  -d '{
    "config": "meta/llama-3.2-1b-instruct@v1.0.0+A100",
    "dataset": {"name": "test-dataset", "namespace": "default"},
    "hyperparameters": {
      "training_type": "sft",
      "finetuning_type": "all_weights",
      "epochs": 3,
      "batch_size": 4,
      "learning_rate": 0.00005,
      "max_seq_length": 2048
    }
  }' | jq

Review the response.

Copy the following values from the response:
- id
- output_model

You can check the job status as detailed in getting the job status.

Deploy the model#

Once the job finishes, Customizer uploads the full model weights to the Data Store and makes them available for deployment through the Deployment Management Service.

Important

Unlike LoRA adapters, SFT models with full weights require a dedicated NIM deployment. The recommended approach is to use the Deployment Management Service, which automatically handles weight downloading, storage provisioning, and NIM deployment.

Prerequisites for Model Deployment#

Before deploying your fine-tuned model, ensure you have:

Access to the Deployment Management Service through the NeMo platform or independent base URL. Store this URL in the environment variable DEPLOYMENT_BASE_URL.
Access to NIM Proxy for inference testing. Store this URL in the environment variable NIM_PROXY_BASE_URL.
The output_model value from your completed customization job (such as default/full_sft_llama_3@v1).
Appropriate NIM container image details for your base model (image name and tag).

Deploy Using Deployment Management Service#

The Deployment Management Service automatically detects SFT models with full weights and handles all deployment complexity, including weight downloading and storage management.

Create a deployment configuration for your fine-tuned model.

Python SDK

# Create deployment configuration for fine-tuned model
deployment_config = client.deployment.configs.create(
    name="<deployment-config-name>",
    namespace="default",
    description="Configuration for fine-tuned model deployment",
    model="<output-model-name>",  # Your output_model from training job
    nim_deployment={
        "image_name": "<nim-container-image>",  # e.g., "nvcr.io/nim/meta/llama-3.2-1b-instruct"
        "image_tag": "1.6.0",  # Use appropriate NIM version
        "gpu": 1
    }
)

print(f"Created deployment config: {deployment_config.name}")

cURL

# Create deployment configuration
curl -X POST \
  "${DEPLOYMENT_BASE_URL}/v1/deployment/configs" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "<deployment-config-name>",
    "namespace": "default",
    "description": "Configuration for fine-tuned model deployment",
    "model": "'${OUTPUT_MODEL}'",
    "nim_deployment": {
      "image_name": "'${NIM_IMAGE}'",
      "image_tag": "'${NIM_TAG}'",
      "gpu": 1
    }
  }' | jq

Deploy the fine-tuned model using the configuration.

Python SDK

# Deploy the fine-tuned model using the configuration
deployment = client.deployment.model_deployments.create(
    name="<deployment-name>",
    namespace="default",
    description="Fine-tuned model deployment",
    config="default/<deployment-config-name>"  # Reference the config
)

print(f"Created deployment: {deployment.name}")
print(f"Status: {deployment.status_details.status}")

cURL

# Deploy the model using the configuration
curl -X POST \
  "${DEPLOYMENT_BASE_URL}/v1/deployment/model-deployments" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "<deployment-name>",
    "namespace": "default",
    "description": "Fine-tuned model deployment",
    "config": "default/<deployment-config-name>"
  }' | jq

Check deployment status.

Python SDK

# Monitor deployment status with polling
# (using the same client initialized above)
import time

def wait_for_deployment(client, deployment_name, namespace="default", timeout=1200):
    """Wait for deployment to complete"""
    start_time = time.time()

    while True:
        # Check timeout
        if time.time() - start_time > timeout:
            raise RuntimeError(f"Deployment timeout after {timeout} seconds")

        # Get current status
        deployment_status = client.deployment.model_deployments.retrieve(
            deployment_name, namespace=namespace
        )

        status = deployment_status.status_details.status
        elapsed = time.time() - start_time

        print(f"Deployment status: {status} after {elapsed:.1f}s")

        if status == "ready":
            print("✅ Deployment completed successfully!")
            break
        elif status in ["failed", "cancelled"]:
            raise RuntimeError(f"Deployment {status}")

        time.sleep(10)  # Poll every 10 seconds

    return deployment_status

# Wait for deployment to complete (takes ~10 minutes first time)
# First deployment is slower due to container image pulling
final_status = wait_for_deployment(client, "sft-llama-deployment")
print(f"Model deployed as: {final_status.models}")

cURL

# Monitor deployment status with polling
while true; do
  STATUS=$(curl -s "${NIM_PROXY_BASE_URL}/v1/deployment/model-deployments/sft-llama-deployment?namespace=default" | \
    jq -r '.status_details.status')
  
  echo "Deployment status: $STATUS"
  
  if [ "$STATUS" = "ready" ]; then
    echo "✅ Deployment completed successfully!"
    break
  elif [ "$STATUS" = "failed" ] || [ "$STATUS" = "cancelled" ]; then
    echo "❌ Deployment $STATUS"
    exit 1
  fi
  
  sleep 10  # Poll every 10 seconds
done

Test Your Fine-Tuned Model#

After the deployment shows “ready” status, test your fine-tuned model through the NIM Proxy endpoint.

Python SDK

# Test using OpenAI-compatible client
from openai import OpenAI

# Initialize OpenAI client pointing to NIM Proxy
openai_client = OpenAI(
    base_url=f"{os.environ['NIM_PROXY_BASE_URL']}/v1",
    api_key="not-used"  # NIM doesn't require API key for local deployment
)

# Test the fine-tuned model
response = openai_client.completions.create(
    model="<output-model-name>",  # Use your actual output_model
    prompt="<test-prompt>",
    max_tokens=128,
    temperature=0.7
)

print("✅ Model inference successful!")
print(f"Response: {response.choices[0].text.strip()}")

cURL

# Test the deployed fine-tuned model
curl -X POST "${NIM_PROXY_BASE_URL}/v1/completions" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "'${OUTPUT_MODEL}'",
    "prompt": "<test-prompt>",
    "max_tokens": 128,
    "temperature": 0.7
  }' | jq

Note

The Deployment Management Service automatically:

Creates necessary storage resources (PVC) for model weights
Downloads weights from the Data Store using NIMCache
Configures and deploys the NIM with custom weights
Manages the complete deployment lifecycle

This eliminates the need for manual Kubernetes operations, weight downloading, and Helm configurations.

Conclusion#

You have started an SFT customization job and deployed a NIM with your custom weights. You can now use the NIM endpoint to interact with your fine-tuned model and assess its performance on your specific use case.

If you included a WandB API key, you can view your training results at wandb.ai under the nvidia-nemo-customizer project.

Note

The W&B integration is optional. When enabled, we’ll send training metrics to W&B using your API key. While we encrypt your API key and don’t log it internally, please review W&B’s terms of service before use.

Next Steps#

Learn how to check customization job metrics using the id.