Create Job#

Important

Config values for a customization job now require a version, denoted by the string following the @. For example, config: meta/llama-3.2-1b-instruct@v1.0.0+A100.

Prerequisites#

Before you can create a customization job, make sure that you have:

  • Obtained the base URL of your NeMo Customizer service.

  • Obtained a list of customization configurations to find the configuration you want to use.

  • Determined the hyperparameters you want to use for the customization job.

  • Set the CUSTOMIZER_BASE_URL environment variable to your NeMo Customizer service endpoint

export CUSTOMIZER_BASE_URL="https://your-customizer-service-url"

To Create a Customization Job#

Choose one of the following options to create a customization job.

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['CUSTOMIZER_BASE_URL']
)

# Create a customization job
job = client.customization.jobs.create(
    name="my-custom-model",
    description="Fine-tuning Llama model for specific use case",
    project="my-project",
    config="meta/llama-3.2-1b-instruct@v1.0.0+A100",
    dataset={
        "name": "my-dataset",
        "namespace": "default"
    },
    hyperparameters={
        "finetuning_type": "lora",
        "training_type": "sft",
        "batch_size": 8,
        "epochs": 50,
        "learning_rate": 0.0001,
        "log_every_n_steps": 0,
        "val_check_interval": 0.01,
        "weight_decay": 0,
        "sft": {
            "hidden_dropout": 1,
            "attention_dropout": 1,
            "ffn_dropout": 1
        },
        "lora": {
            "adapter_dim": 8,
            "adapter_dropout": 1
        }
    },
    output_model="my-custom-model@v1",
    ownership={
        "created_by": "",
        "access_policies": {}
    },
    # Optional: Add W&B integration
    integrations=[
        {
            "type": "wandb",
            "wandb": {
                "project": "custom-wandb-project",
                "entity": "my-team",
                "notes": "Custom fine-tuning experiment",
                "tags": ["fine-tuning", "llama"]
            }
        }
    ],
    # Include W&B API key in headers if using W&B
    wandb_api_key="YOUR_WANDB_API_KEY"
)

print(f"Created job with ID: {job.id}")
print(f"Job status: {job.status}")
curl -X POST \
  "${CUSTOMIZER_BASE_URL}/customization/jobs" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'wandb-api-key: <YOUR_WANDB_API_KEY>' \
  -d '{
    "name": "<NAME>",
    "description": "<DESCRIPTION>",
    "project": "<PROJECT_NAME>",
    "config": "<CONFIG_NAME>",
    "hyperparameters": {
      "finetuning_type": "lora",
      "training_type": "sft",
      "batch_size": 8,
      "epochs": 50,
      "learning_rate": 0.0001,
      "log_every_n_steps": 0,
      "val_check_interval": 0.01,
      "weight_decay": 0,
      "sft": {
        "hidden_dropout": 1,
        "attention_dropout": 1,
        "ffn_dropout": 1
      },
      "lora": {
        "adapter_dim": 8,
        "adapter_dropout": 1
      }
    },
    "output_model": "<OUTPUT_MODEL_NAME>",
    "dataset": "<DATASET_NAME>",
    "ownership": {
      "created_by": "",
      "access_policies": {}
    }
  }' | jq
Example Response
{
  "id": "cust-JGTaMbJMdqjJU8WbQdN9Q2",
  "created_at": "2024-12-09T04:06:28.542884",
  "updated_at": "2024-12-09T04:06:28.542884",
  "config": {
    "schema_version": "1.0",
    "id": "af783f5b-d985-4e5b-bbb7-f9eec39cc0b1",
    "created_at": "2024-12-09T04:06:28.542657",
    "updated_at": "2024-12-09T04:06:28.569837",
    "custom_fields": {},
    "name": "meta/llama-3_1-8b-instruct",
    "base_model": "meta/llama-3_1-8b-instruct",
    "model_path": "llama-3_1-8b-instruct",
    "training_types": [],
    "finetuning_types": [
      "lora"
    ],
    "precision": "bf16",
    "num_gpus": 4,
    "num_nodes": 1,
    "micro_batch_size": 1,
    "tensor_parallel_size": 1,
    "max_seq_length": 4096
  },
  "dataset": {
    "schema_version": "1.0",
    "id": "dataset-XU4pvGzr5tvawnbVxeJMTb",
    "created_at": "2024-12-09T04:06:28.542657",
    "updated_at": "2024-12-09T04:06:28.542660",
    "custom_fields": {},
    "name": "default/sample-basic-test",
    "version_id": "main",
    "version_tags": []
  },
  "hyperparameters": {
    "finetuning_type": "lora",
    "training_type": "sft",
    "batch_size": 16,
    "epochs": 10,
    "learning_rate": 0.0001,
    "lora": {
      "adapter_dim": 16
    }
  },
  "output_model": "test-example-model@v1",
  "status": "created",
  "project": "test-project",
  "custom_fields": {},
  "ownership": {
    "created_by": "me",
    "access_policies": {
      "arbitrary": "json"
    }
  }
}