Start a Full SFT Customization Job#

Learn how to use the NeMo Microservices Platform to create a supervised fine-tuning (SFT) job using a custom dataset.

About SFT Customization Jobs#

Full Supervised Fine-Tuning (SFT) shares similarities with LoRA training workflows but differs in two key ways:

  1. Training Impact: While LoRA only updates a small set of parameters and keeps most model weights frozen, full SFT modifies all weights and layers during training.

  2. Deployment Requirements: Full SFT models need a new NVIDIA Inference Microservice (NIM) deployment, unlike LoRA adapters that can be served from an existing NIM. Due to these differences, full SFT typically requires more computational resources and training time than LoRA-based approaches.

Prerequisites#

Before you can start an SFT customization job, make sure that you have the following:

  • Access to a NeMo Customizer Microservice.

  • Completed the Manage Entities tutorial series, or set up a dedicated project.

  • The huggingface_hub Python package installed.


Select Model#

Find Available Configs#

First, we need to identify what model customization configurations are available for you to use. This will describe the models and corresponding techniques you can choose. SFT customization jobs require a model that support the finetuning_type of all_weights.

Note

GPU requirements are typically higher for all_weights than with PEFT techniques like LoRA.

  1. Get all customization configurations.

    curl -X GET "https://${CUSTOMIZER_HOSTNAME}/v1/customization/configs" \
      -H 'Accept: application/json' | jq
    
  2. Review the response to find a model that meets your requirements.

    Example Response
    {
      "object": "list",
      "data": [
       {
        "name": "meta/llama-3.2-1b-instruct",
        "namespace": "default",
        "dataset_schema": {
          "title": "NDJSONFile",
          "type": "array",
          "items": {
            "description": "Schema for Supervised Fine-Tuning (SFT) training data items.\n\nDefines the structure for training data used in SFT.",
            "properties": {
              "prompt": {
                "description": "The prompt for the entry",
                "title": "Prompt",
                "type": "string"
              },
              "completion": {
                "description": "The completion to train on",
                "title": "Completion",
                "type": "string"
              }
            },
            "required": [
          "prompt",
              "completion"
            ],
            "title": "SFTDatasetItemSchema",
            "type": "object"
          },
          "description": "Newline-delimited JSON (NDJSON) file containing MyModel objects"
        },
        "training_options": [
          {
            "training_type": "sft",
            "finetuning_type": "lora",
            "num_gpus": 1,
            "num_nodes": 1,
            "tensor_parallel_size": 1,
            "use_sequence_parallel": false
          },
          {
            "training_type": "sft",
            "finetuning_type": "all_weights",
            "num_gpus": 1,
            "num_nodes": 1,
            "tensor_parallel_size": 1,
            "use_sequence_parallel": false
          }
          ]
        }
    

Note

For more information on the response fields, review the Customization Configs schema reference.

The response shows that Llama 3.2 1b Instruct is available for full SFT, and will require 1 GPU to train.

Review Dataset Schema#

You can examine the dataset_schema field in the response to understand what data format your model requires.

The schema outlines the specific fields and data types your dataset needs to include, formatted as New Line Delimited JSON (NDJSON). In the next section, we’ll walk through an example to help you understand the schema structure.

Create Datasets#

Now that we know the required shape of the dataset expected by the model configuration, we can prepare our training and validation files and upload them to the dataset.

Prepare Files#

  1. Create two files, train.jsonl and validation.jsonl.

  2. Populate the files with data that is compatible with the meta/llama-3.2-1b-instruct model.

Note

Each record should be on a single line in your .jsonl file, with no line breaks within the JSON objects.

{"prompt": "You are an email writting assistant, please help people write cogent emails.\n\nTell John I finally got the PS5 with 1TB from Target and will be home at 8 pm so we can play Call of Duty. State Jason told me it was fun.", "completion": "Dear John, I got it! I managed to find the new PS5 at Target. I will be home by 8pm so we can play Call of Duty on it. It actually is a lot of fun (or so has Jason told me). It is the 1tb one so I hope it serves us well. See you later.  Jackson Lee"}
{"prompt": "You are an email writting assistant, please help people write cogent emails.\n\nCheck in with the horseback riding facility if my daughter can go for a trial ride to see if she likes it. Ask if the horses are gentle and what the capacity is for each lesson.","completion": "Hello! My 9-year-old is interested in horseback riding lessons. She's only ridden ponies at the beach, but had taken up a kean interest in horses lately. How many children are in each lesson group? Are the horses gentle? Would she be able to try a lesson before committing to a full six-month contract?   Thank you for replying at your earliest convenience.  Heather"}

Upload Training Data#

Initialize Client#

You need to up upload the training files to the training path in NeMo Data Store, and validation files to the validation path. You can have multiple files in each path and they will all be used.

To set up the Hugging Face API client, you’ll need these configuration values:

  • Host URL for the entity store service

  • Host URL for the data storage service

  • A namespace to organize your resources

  • Name of your dataset

from huggingface_hub import HfApi
import os

# Configuration

ENTITY_HOST = # Replace with the public url of your Entity Store
DS_HOST = # Replace with the public url of you Datastore
NAMESPACE = "default"
DATASET_NAME = "test-dataset" # dataset name needs to be unique for the namespace

# Initialize Hugging Face API client
HF_API = HfApi(endpoint=f"{DS_HOST}/v1/hf", token="")

Create Namespaces#

Set the namespace we defined in our configuration values in both the NeMo Entity Store and the NeMo Data Store so that they match.

def create_namespaces(entity_host, ds_host, namespace):
    # Create namespace in entity store
    entity_store_url = f"{entity_host}/v1/namespaces"
    resp = requests.post(entity_store_url, json={"id": namespace})
    assert resp.status_code in (200, 201, 409, 422), \
        f"Unexpected response from Entity Store during Namespace creation: {resp.status_code}"

    # Create namespace in datastore
    nds_url = f"{ds_host}/v1/datastore/namespaces"
    resp = requests.post(nds_url, data={"namespace": namespace})
    assert resp.status_code in (200, 201, 409, 422), \
        f"Unexpected response from datastore during Namespace creation: {resp.status_code}"

create_namespaces(ENTITY_HOST, DS_HOST, NAMESPACE)

Set Up Dataset Repository#

Create a dataset repository in NeMo Data Store.

def setup_dataset_repo(hf_api, namespace, dataset_name, entity_host):
    repo_id = f"{namespace}/{dataset_name}"

    # Create the repo in datastore
    hf_api.create_repo(repo_id, repo_type="dataset", exist_ok=True)

    # Create dataset in entity store
    entity_store_url = f"{entity_host}/v1/datasets"
    payload = {
        "name": dataset_name,
        "namespace": namespace,
        "files_url": f"hf://datasets/{repo_id}"
    }
    resp = requests.post(entity_store_url, json=payload)
    assert resp.status_code in (200, 201, 409, 422), \
        f"Unexpected response from Entity Store creating dataset: {resp.status_code}"

    return repo_id

repo_id = setup_dataset_repo(HF_API, NAMESPACE, DATASET_NAME, ENTITY_HOST)

Upload Files#

Upload the training and validation files to the dataset.

def upload_dataset_files(hf_api, repo_id):
    # Upload training file
    hf_api.upload_file(
        path_or_fileobj="train.ndjson",
        path_in_repo="training/training_file.jsonl",
        repo_id=repo_id,
        repo_type="dataset",
        revision="main",
        commit_message=f"Training file for {repo_id}"
    )

    # Upload validation file
    hf_api.upload_file(
        path_or_fileobj="validation.ndjson",
        path_in_repo="validation/validation_file.jsonl",
        repo_id=repo_id,
        repo_type="dataset",
        revision="main",
        commit_message=f"Validation file for {repo_id}"
    )

upload_dataset_files(HF_API, repo_id)

Checkpoint

At this point, we’ve uploaded our training and validation files to the dataset and are ready to define the details of our customization job.

Start Model Customization Job#

Set Hyperparameters#

While model customization configurations come with default settings, you can customize your training by specifying additional hyperparameters in the hyperparameters field of your customization job.

To train with full SFT, we must:

  1. Set the training_type to sft (Supervised Fine-Tuning).

  2. Set the finetuning_type to all_weights.

Example configuration:

{
  "hyperparameters": {
    "training_type": "sft",
    "finetuning_type": "all_weights",
    "epochs": 2,
    "batch_size": 16,
    "learning_rate": 0.00005
  }
}

Note

For more information on hyperparameter options, review the Hyperparameter Options reference.

Create and Submit Training Job#

Use the following command to start a LoRA training job. Replace meta/llama-3.2-1b-instruct with your chosen model configuration and test-dataset with your dataset name.

  1. Create a job using the model configuration (config), dataset, and hyperparameters we defined in the previous sections.

    curl -X "POST" \
      "https://${CUSTOMIZER_HOSTNAME}/v1/customization/jobs" \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '
        {
        "config": "meta/llama-3.2-1b-instruct",
        "dataset": {"namespace": "default", "name": "test-dataset"},
        "hyperparameters": {
          "training_type": "sft",
          "finetuning_type": "all_weights",
          "epochs": 2,
          "batch_size": 16,
          "learning_rate": 0.00005
        },
        "output_model": "default/full_sft_llama_3@v1"
    }' | jq
    
  2. Review the response.

    Example Response
    {
      "id": "cust-S2qNunob3TNW6JjN75ESCG",
      "created_at": "2025-03-17T02:26:52.731523",
      "updated_at": "2025-03-17T02:26:52.731526",
      "namespace": "default",
      "config": {
         "base_model": "meta/llama-3.2-1b-instruct",
         "precision": "bf16-mixed",
         "num_gpus": 1,
         "num_nodes": 1,
         "micro_batch_size": 1,
         "tensor_parallel_size": 1,
         "max_seq_length": 4096,
         "prompt_template": "{prompt} {completion}"
      },
      "dataset": {"namespace": "default", "name": "test-dataset"},
      "hyperparameters": {
        "finetuning_type": "all_weights",
        "training_type": "sft",
        "batch_size": 16,
        "epochs": 3,
        "learning_rate": 0.00001,
        "sequence_packing_enabled": false
      },
      "output_model": "default/full_sft_llama_3@v1",
      "status": "created",
      "project": "test-project",
      "ownership": {
        "created_by": "me",
        "access_policies": {
          "arbitrary": "json"
        }
      }
    }
    
  3. Copy the following values from the response:

    • id

    • output_model

You can monitor the job status as detailed in getting the job status.

Deploy the model#

Once the job finishes, Customizer uploads the full models weights to the Data Store.

Important

Unlike LoRA adapters, NIM does not deploy full weights automatically; you must deploy a new NIM with these weights. This requires having direct access to the Kubernetes cluster. If necessary, ask your cluster administrator to perform the following steps.

Create a PVC for Model Weights#

  1. Create a file named pvc_definition.yaml that contains the following code.

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: sft-custom-weights
    spec:
      accessModes:
        - ReadWriteMany
      storageClassName: <Your ReadWriteMany storage class>
      resources:
        requests:
          storage: 20Gi
    
  2. Apply this definition to create a PVC.

    kubectl apply -f pvc_definition.yaml
    

Download Weights into PVC#

Let’s create a pod that mounts the PVC and download our custom weights.

  1. Create a file defining our pod.

    apiVersion: v1
    kind: Pod
    metadata:
      name: sft-pvc-pod-hf-cli
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      securityContext:
        runAsUser: 1000
        fsGroup: 1000
      containers:
      - name: sft-pvc-pod-hf-cli
        image: nvcr.io/nvidia/nemo-microservices/customizer-api:25.04
        command: [ "/bin/sh" ]
        args: ["-c", "while true; do sleep 10;done"]
        env:
        - name: HF_TOKEN
          value: "token"
        - name: HF_ENDPOINT
          value: "<Your Data Store url>/v1/hf"
        volumeMounts:
          - name: mount-models
            mountPath: /mount/models
      volumes:
        - name: mount-models
          persistentVolumeClaim:
            claimName: sft-custom-weights
      imagePullSecrets:
      - name: nvcrimagepullsecret
    
  2. Apply the pod definition.

    kubectl apply -f pod_definition.yaml
    
  3. Once the pod is up, access the pod.

    kubectl exec --tty --stdin sft-pvc-pod-hf-cli -- /usr/bin/bash
    
  4. Run the following command to download the weights.

    huggingface-cli download default/full_sft_llama_3 --revision v1 --local-dir /mount/models/all_weights
    

    Note

    This file is around 15 GB, and can take around 20 minutes to download.

  5. Once it is downloaded, make sure it will be accessible by the NIM, running chmod -R 775 /mount/models/all_weights.

  6. Exit the pod.

Start a NIM#

Now let’s deploy a NIM with your custom weights. The NIM will build an optimized TensorRT-LLM engine automatically.

  1. First, create a values file for the Helm deployment with name nim_sft.yaml.

    image:
      repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
      tag: 1.6.0
    imagePullSecrets:
     - name: nvcrimagepullsecret
    service:
     labels:
       app.nvidia.com/nim-type: inference
    env:
     - name: NIM_FT_MODEL
       value: /model-store/all_weights
     - name: NIM_SERVED_MODEL_NAME
       value: "llama3.2-1b-custom-weights"
     - name: NIM_CUSTOM_MODEL_NAME
       value: custom_1
    persistence:
     enabled: true
     existingClaim: sft-custom-weights
     accessMode: ReadWriteMany
    
  2. Download the NIM Helm install from NGC.

    helm fetch https://helm.ngc.nvidia.com/nim/charts/nim-llm-1.3.0.tgz --username='$oauthtoken' --password=<YOUR API KEY>
    
  3. Perform a Helm install.

    helm install nim ./nim-llm-1.3.0.tgz -f nim_sft.yaml
    
  4. Add a DNS entry for your NIM (optional).

If you are not using NIM Proxy, you need to add a DNS entry. This will depend on your cluster.

  1. Query your NIM with the custom values. The DNS entry you query will depend on your setup. If you are using the Begginer Tutorial, this hostname will be http://nim.test.

    curl -X POST "<Your NIM hostname>/v1/completions" \
       -H 'accept: application/json' \
       -H 'Content-Type: application/json' \
       -d '{
          "model": "llama3.2-1b-custom-weights",
          "prompt": "Extract from the following context the minimal span word for word that best answers the question.\n- If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.\n- If you do not know the answer to a question, please do not share false information.\n- If the answer is not in the context, the answer should be \"?\".\n- Your answer should not include any other text than the answer to the question. Do not include any other text like \"Here is the answer to the question:\" or \"The minimal span word for word that best answers the question is:\" or anything like that.\n\nContext: When is the upcoming GTC event? GTC 2018 attracted over 8,400 attendees. Due to the COVID pandemic of 2020, GTC 2020 was converted to a digital event and drew roughly 59,000 registrants. The 2021 GTC keynote, which was streamed on YouTube on April 12, included a portion that was made with CGI using the Nvidia Omniverse real-time rendering platform. This next GTC will take place in the middle of March, 2023. Answer:",
          "max_tokens": 128
       }' | jq
    

Conclusion#

You have successfully started an SFT customization job and deployed a NIM with your custom weights. You can now use the NIM endpoint to interact with your fine-tuned model and evaluate its performance on your specific use case.

Next Steps#

Learn how to check customization job metrics using the id.