Create Customization Config#

Prerequisites#

Before you can create a customization job, make sure that you have:

Created a customization target.
Check if there is an existing relevant config that is disabled. If there is, do not create a new config and update that one.
Review the Kubernetes PodSpec documentation.
Set the CUSTOMIZER_BASE_URL environment variable to your NeMo Customizer service endpoint.

export CUSTOMIZER_BASE_URL="https://your-customizer-service-url"

To Create a Customization Config#

You can create a customization configuration in the following ways.

The following example defines a pod_spec that allows jobs to be scheduled on nodes tainted with app=a100-workload by specifying the required toleration. Additionally, it specifies that jobs should only run on nodes with the label matching "nvidia.com/gpu.product": "NVIDIA-A100-80GB". Be aware that these values are specific to your Kubernetes cluster. If configured for incorrect taints and labels, jobs using this config will be unschedulable.

In detail, this create command is comprised of three separate sections: general configuration, configuration for the specific training algorithm, and Kubernetes-specific configuration that determines where on the cluster the job will run.

General Configuration#

Parameter	Value (Example)	Description
`name`	`"llama-3.1-8b-instruct@v1.0.0+A100"`	A unique, descriptive name for this specific configuration. This example indicates the model (`llama-3.1-8b-instruct`), a version (`v1.0.0`), and the hardware for which it’s optimized. The main hardware concern is GPU memory. Our provided A100 configs are compatible with other 80GB GPUs.
`namespace`	`"default"`	The NeMo microservice namespace where the customization jobs will run.
`description`	`"Configuration for Llama 3.1 8B on A100 GPUs"`	A brief, human-readable summary of the configuration’s purpose.
`target`	`"meta/llama-3.1-8b-instruct@2.0"`	Specifies the target model to be fine-tuned. This is a reference to a previously created target that identifies the base model and its stored checkpoint.
`training_precision`	`"bf16"`	Sets the numerical precision to be used during the training process. Bfloat16 (Brain Floating Point, 16-bit) is recommended.
`max_seq_length`	`2048`	The maximum sequence length (in tokens) that the model will process during training. Longer sequences require more GPU memory.
`prompt_template`	`"{prompt} {completion}"`	Prompt template used to extract keys from the dataset.

Training Configuration#

Parameter	Value (Example)	Description
`training_type`	`"sft"`	Specifies the high-level fine-tuning method, such as SFT (Supervised Fine-Tuning).
`finetuning_type`	`"lora"`	The technique for adjusting model weights. Specifies whether to use parameter-efficient fine-tuning (LoRA) or to train all weights (full fine-tuning).
`num_gpus`	`2`	The number of GPUs that will be utilized for this fine-tuning job.
`micro_batch_size`	`8`	The batch size processed by a single GPU before gradients are accumulated.
`tensor_parallel_size`	`1`	The number of devices (GPUs) used for tensor parallelism. A value of $1$ means the model’s tensor layers are not split across devices.
`pipeline_parallel_size`	` 1$	The number of devices (GPUs) used for pipeline parallelism. A value of $1$ means the model is not split into stages across devices.
`use\_sequence\_parallel`	`False\`	A flag to enable or disable sequence parallelism, a technique to distribute the sequence dimension of activation tensors across devices to save memory.

Kubernetes Configuration (`pod_spec`)#

Parameter	Value (Example)	Description
`tolerations`	List /of one toleration	A Kubernetes scheduling constraint. This example defines a toleration that allows the Pod to be scheduled on nodes with the taint `app=a100-workload:NoSchedule`. Taints are often used to reserve a pool of nodes for specific, high-resource workloads like A100 jobs.
`node_selectors`	`{"nvidia.com/gpu.product": "NVIDIA-A100-80GB"}`	A set of key-value pairs used to select a set of target nodes for the Pod. This ensures the job is only scheduled on nodes that have a matching label, specifically, those equipped with the NVIDIA A100 80GB GPU.
`annotations`	`{"sidecar.istio.io/inject": "false"}`	Arbitrary metadata for the Kubernetes Pod. This example explicitly tells the Istio service mesh not to inject a sidecar container, which is often done for performance-critical or resource-intensive jobs.

For more information about GPU cluster configurations, see Configure Cluster GPUs.

Python SDK

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['CUSTOMIZER_BASE_URL']
)

# Create a customization config
config = client.customization.configs.create(
    name="llama-3.1-8b-instruct@v1.0.0+A100",
    namespace="default",
    description="Configuration for Llama 3.1 8B on A100 GPUs",
    target="meta/llama-3.1-8b-instruct@2.0",
    training_options=[
        {
            "training_type": "sft",
            "finetuning_type": "lora",
            "num_gpus": 2,
            "micro_batch_size": 8,
            "tensor_parallel_size": 1,
            "pipeline_parallel_size": 1,
            "use_sequence_parallel": False
        }
    ],
    training_precision="bf16",
    max_seq_length=2048,
    prompt_template=""{prompt} {completion}"",
    pod_spec={
        "tolerations": [
            {
                "key": "app",
                "operator": "Equal",
                "value": "a100-workload",
                "effect": "NoSchedule"
            }
        ],
        "node_selectors": {
            "nvidia.com/gpu.product": "NVIDIA-A100-80GB"
        },
        "annotations": {
            "sidecar.istio.io/inject": "false"
        }
    }
)

print(f"Created config: {config.name}")
print(f"Config ID: {config.id}")

cURL

curl -X POST \
  "${CUSTOMIZER_BASE_URL}/v1/customization/configs" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "llama-3.1-8b-instruct@v1.0.0+A100",
    "namespace": "default",
    "description": "Configuration for Llama 3.1 8B on A100 GPUs",
    "target": "meta/llama-3.1-8b-instruct@2.0",
    "training_options": [
       {
          "training_type": "sft",
          "finetuning_type": "lora",
          "num_gpus": 2,
          "micro_batch_size": 8,
          "tensor_parallel_size": 1,
          "pipeline_parallel_size": 1,
          "use_sequence_parallel": false
      }
    ],
    "training_precision": "bf16",
    "max_seq_length": 2048,
    "prompt_template": "{prompt} {completion}",
    "pod_spec": {
      "tolerations": [
        {
          "key": "app",
          "operator": "Equal",
          "value": "a100-workload",
          "effect": "NoSchedule"
        }
      ],
      "node_selectors": {
        "nvidia.com/gpu.product": "NVIDIA-A100-80GB"
      },
      "annotations": {
        "sidecar.istio.io/inject": "false"
      }
    }
  }' | jq