Start a Knowledge Distillation (KD) Customization Job#

Learn how to use the NeMo Microservices Platform to create a Knowledge Distillation (KD) job, transferring knowledge from a large teacher model to a smaller student model using your own dataset.

About Knowledge Distillation#

Knowledge distillation is a technique for transferring knowledge from a large, high-capacity teacher model to a smaller student model. The distilled model (student) often achieves higher accuracy than models trained using standard language modeling loss alone.

KD is useful when you want to deploy smaller models without losing much accuracy compared to a large model.

Prerequisites#

Before starting, make sure you have:

Access to a NeMo Customizer microservice.
Completed the Manage Entities tutorial series, or set up a dedicated project and dataset.
A teacher model (already fine-tuned and available as a customization target).
The requests Python package installed.

Notes and Limitations#

Only logit-pair distillation is currently supported.
LoRA adapters can’t be used as teacher models.

Select Teacher and Student Models#

You need two models available as customization targets:

Teacher model: A large, fine-tuned model
Student model: A smaller model you want to distill knowledge into

Both models must use the same tokenizer. Only GPT-based NeMo 2.0 checkpoints are supported for now.

Select Model#

You can either find an existing customization config to use or create a new one.

Find Available Configs#

Identify what model customization configurations are available that support distillation training. KD customization jobs require a model configuration that supports both training_type of distillation and finetuning_type of all_weights.

Get all customization configurations.

Python SDK

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=f"{os.environ['CUSTOMIZER_BASE_URL']}"
)

# Get all customization configurations
configs = client.customization.configs.list()

print(f"Found {len(configs.data)} configurations")
for config in configs.data:
    print(f"Config: {config.name}")
    print(f"  Training options: {len(config.training_options)}")
    
    # Check for distillation support
    distillation_supported = any(
        option.training_type == "distillation" and option.finetuning_type == "all_weights"
        for option in config.training_options
    )
    if distillation_supported:
        print(f"  ✓ Supports distillation")
    else:
        print(f"  ✗ Does not support distillation")

cURL

curl -X GET "${CUSTOMIZER_BASE_URL}/customization/configs" \
     -H 'Accept: application/json' | jq

Review the response to find a model configuration that includes distillation in its training_options.

Create Config#

If no appropriate configuration is available, you can create one that supports distillation training. Here’s how to create a config with distillation support:

Python SDK

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=f"{os.environ['CUSTOMIZER_BASE_URL']}"
)

# Create a customization config with distillation support
config = client.customization.configs.create(
    name="llama-3.2-1b-instruct@v1.0.0+A100",
    namespace="default",
    description="Configuration for Llama 3.2 1B with distillation support",
    target="meta/llama-3.2-1b-instruct",
    training_options=[
        {
            "training_type": "sft",
            "finetuning_type": "lora",
            "num_gpus": 1,
            "tensor_parallel_size": 1,
            "pipeline_parallel_size": 1,
            "use_sequence_parallel": False,
            "micro_batch_size": 1
        },
        {
            "training_type": "distillation",
            "finetuning_type": "all_weights",
            "num_gpus": 1,
            "tensor_parallel_size": 1,
            "pipeline_parallel_size": 1,
            "use_sequence_parallel": False,
            "micro_batch_size": 1
        }
    ],
    training_precision="bf16",
    max_seq_length=2048
)

print(f"Created config: {config.name}")
print(f"Training options: {len(config.training_options)}")

cURL

curl -X POST \
  "${CUSTOMIZER_BASE_URL}/customization/configs" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "llama-3.2-1b-instruct@v1.0.0+A100",
    "namespace": "default",
    "description": "Configuration for Llama 3.2 1B with distillation support",
    "target": "meta/llama-3.2-1b-instruct",
    "training_options": [
       {
          "training_type": "sft",
          "finetuning_type": "lora",
          "num_gpus": 1,
          "tensor_parallel_size": 1,
          "pipeline_parallel_size": 1,
          "use_sequence_parallel": false,
          "micro_batch_size": 1
      },
       {
          "training_type": "distillation",
          "finetuning_type": "all_weights",
          "num_gpus": 1,
          "tensor_parallel_size": 1,
          "pipeline_parallel_size": 1,
          "use_sequence_parallel": false,
          "micro_batch_size": 1
      }
    ],
    "training_precision": "bf16",
    "max_seq_length": 2048
  }' | jq

For detailed information about creating configs, see Create Customization Config.

Create Datasets#

Prepare your training and validation datasets in the same format required for SFT jobs. The dataset should be the same as (or similar to) the one used to fine-tune the teacher model.

Refer to the Format Training Datasets tutorial for details on dataset structure and upload instructions.

Start Model Customization Job#

Set Hyperparameters#

When creating a KD job, set the following in your job configuration:

training_type: distillation
finetuning_type: all_weights (the only supported option)
distillation.teacher: The name of the teacher Target (must already exist)

Example hyperparameters section:

"hyperparameters": {
    "training_type": "distillation",
    "finetuning_type": "all_weights",
    "epochs": 2,
    "batch_size": 16,
    "learning_rate": 0.0001,
    "distillation": {
        "teacher": "meta/finetuned-llama-3_1-8b@v1"
    }
}

Create and Submit Customization Job#

Python SDK

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=f"{os.environ['CUSTOMIZER_BASE_URL']}"
)

# Create a knowledge distillation job
job = client.customization.jobs.create(
    config="meta/llama-3.2-1b-instruct@v1.0.0+A100",  # must support distillation
    dataset={
        "name": "<dataset_name>",
        "namespace": "<namespace>"
    },
    hyperparameters={
        "training_type": "distillation",
        "finetuning_type": "all_weights",
        "epochs": 2,
        "batch_size": 16,
        "learning_rate": 0.0001,
        "distillation": {
            "teacher": "<namespace>/<finetuned_model_name>"  # teacher target
        }
    },
    output_model="default/my-distilled-3.2-1b@v1"
)

print(f"Created distillation job with ID: {job.id}")
print(f"Job status: {job.status}")
print(f"Student model: {job.config.target}")
print(f"Teacher model: {job.hyperparameters.distillation.teacher}")
print(f"Output model: {job.output_model}")

cURL

curl -X POST "${CUSTOMIZER_BASE_URL}/customization/jobs" \
  -H "Content-Type: application/json" \
  -d '{
    "target": "meta/llama-3.2-1b-instruct",
    "config": "meta/llama-3.2-1b-instruct@v1.0.0+A100",
    "hyperparameters": {
      "training_type": "distillation",
      "finetuning_type": "all_weights",
      "epochs": 2,
      "batch_size": 16,
      "learning_rate": 0.0001,
      "distillation": {
        "teacher": "<namespace>/<finetuned_model_name>"
      }
    },
    "dataset": {
      "name": "<dataset_name>",
      "namespace": "<namespace>"
    },
    "output_model": "default/my-distilled-3.2-1b@v1"
  }'

Important

The config field must include a version, for example: meta/llama-3.2-1b-instruct@v1.0.0+A100. Omitting the version will result in an error like:

{ "detail": "Version is not specified in the config URN: meta/llama-3.2-1b-instruct" }

You can find the correct config URN (with version) by inspecting the output of the /customization/configs endpoint. Use the name and version fields to construct the URN as name@version.