Create Job#

Important

Config values for a customization job now require a version, denoted by the string following the @. For example, config: meta/llama-3.2-1b-instruct@v1.0.0+80GB.

Prerequisites#

Before you can create a customization job, make sure that you have:

  • Obtained the base URL of your NeMo Customizer service.

  • Obtained a list of customization configurations to find the configuration you want to use.

  • Determined the hyperparameters you want to use for the customization job.

  • Set the CUSTOMIZER_BASE_URL environment variable to your NeMo Customizer service endpoint

export CUSTOMIZER_BASE_URL="https://your-customizer-service-url"

To Create a Customization Job#

Choose one of the following options to create a customization job.

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['CUSTOMIZER_BASE_URL']
)

# Create a customization job
job = client.customization.jobs.create(
    name="my-custom-model",
    description="Fine-tuning Llama model for specific use case",
    project="my-project",
    config="meta/llama-3.2-1b-instruct@v1.0.0+80GB",
    dataset={
        "name": "my-dataset",
        "namespace": "default"
    },
    hyperparameters={
        "finetuning_type": "lora",
        "training_type": "sft",
        "batch_size": 8,
        "epochs": 50,
        "learning_rate": 0.0001,
        "log_every_n_steps": 0,
        "val_check_interval": 0.01,
        "weight_decay": 0,
        "sft": {
            "hidden_dropout": 0.01,
            "attention_dropout": 0.01
        },
        "lora": {
            "adapter_dim": 8,
            "adapter_dropout": 0.01
        }
    },
    output_model="my-custom-model@v1",
    ownership={
        "created_by": "",
        "access_policies": {}
    },
    # Optional: Add W&B integration
    integrations=[
        {
            "type": "wandb",
            "wandb": {
                "project": "custom-wandb-project",
                "entity": "my-team",
                "notes": "Custom fine-tuning experiment",
                "tags": ["fine-tuning", "llama"]
            }
        }
    ],
    # Optional: Include W&B API key to use W&B
    wandb_api_key="YOUR_WANDB_API_KEY"
)

print(f"Created job with ID: {job.id}")
print(f"Job status: {job.status}")
curl -X POST \
  "${CUSTOMIZER_BASE_URL}/v1/customization/jobs" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'wandb-api-key: <YOUR_WANDB_API_KEY>' \
  -d '{
    "name": "<NAME>",
    "description": "<DESCRIPTION>",
    "project": "<PROJECT_NAME>",
    "config": "<CONFIG_NAME>",
    "hyperparameters": {
      "finetuning_type": "lora",
      "training_type": "sft",
      "batch_size": 8,
      "epochs": 50,
      "learning_rate": 0.0001,
      "log_every_n_steps": 0,
      "val_check_interval": 0.01,
      "weight_decay": 0,
      "sft": {
        "hidden_dropout": 0.01,
        "attention_dropout": 0.01
      },
      "lora": {
        "adapter_dim": 8,
        "adapter_dropout": 0.01
      }
    },
    "output_model": "<OUTPUT_MODEL_NAME>",
    "dataset": "<DATASET_NAME>",
    "ownership": {
      "created_by": "",
      "access_policies": {}
    },
    "integrations": [
      {
        "type": "wandb",
        "wandb": {
          "project": "custom-wandb-project",
          "entity": "custom-team-or-username",
          "notes": "Custom notes about this run",
          "tags": ["fine-tuning", "llama"]
        }
      }
    ],
  }' | jq
Example Response
{
  "id": "cust-JGTaMbJMdqjJU8WbQdN9Q2",
  "created_at": "2024-12-09T04:06:28.542884",
  "updated_at": "2024-12-09T04:06:28.542884",
  "config": {
    "schema_version": "1.0",
    "id": "af783f5b-d985-4e5b-bbb7-f9eec39cc0b1",
    "created_at": "2024-12-09T04:06:28.542657",
    "updated_at": "2024-12-09T04:06:28.569837",
    "custom_fields": {},
    "name": "meta/llama-3_1-8b-instruct",
    "base_model": "meta/llama-3_1-8b-instruct",
    "model_path": "llama-3_1-8b-instruct",
    "training_types": [],
    "finetuning_types": [
      "lora"
    ],
    "precision": "bf16",
    "num_gpus": 4,
    "num_nodes": 1,
    "micro_batch_size": 1,
    "tensor_parallel_size": 1,
    "max_seq_length": 4096
  },
  "dataset": {
    "schema_version": "1.0",
    "id": "dataset-XU4pvGzr5tvawnbVxeJMTb",
    "created_at": "2024-12-09T04:06:28.542657",
    "updated_at": "2024-12-09T04:06:28.542660",
    "custom_fields": {},
    "name": "default/sample-basic-test",
    "version_id": "main",
    "version_tags": []
  },
  "hyperparameters": {
    "finetuning_type": "lora",
    "training_type": "sft",
    "batch_size": 16,
    "epochs": 10,
    "learning_rate": 0.0001,
    "lora": {
      "adapter_dim": 16
    }
  },
  "output_model": "test-example-model@v1",
  "status": "created",
  "project": "test-project",
  "custom_fields": {},
  "ownership": {
    "created_by": "me",
    "access_policies": {
      "arbitrary": "json"
    }
  }
}

Complete API Reference#

For a complete reference of all customization job fields with constraints and types:

CustomizationJobInput object
The customization job creation Input supported by the Customizer.
Properties
name string
The name of the entity. Must be unique inside the namespace. If not specified, it will be the same as the automatically generated id.
namespace string
The namespace of the entity. This can be missing for namespace entities or in deployments that don't use namespaces.
Default: default
description string
The description of the entity.
project string
The URN of the project associated with this entity.
config * string | object
The customization configuration to be used.
Any of:
Option 1: string - A reference to CustomizationConfig.
Option 2: object - A customization configuration template supported by the Customizer.
hyperparameters * object
The hyperparameters to be used for customization.
Properties
finetuning_type * string
The finetuning type for the customization job.
Allowed values:
loralora_mergedall_weights
training_type string
The training type for the customization job.
Allowed values:
dposftdistillationgrpo
Default: sft
warmup_steps integer
Learning rate schedulers gradually increase the learning rate from a small initial value to the target value in `learning_rate` over this number of steps
Default: 200
seed integer
This is the seed that will be used to initialize all underlying Pytorch and Triton Trainers. By default this will be randomly initialized. Caution: There are a number of processes that still introduce variance between training runs for models trained from HF checkpoint.
Default: 42
max_steps integer
If this parameter is provided and is greater than 0, we will stop execution after this number of steps. This number can not be less than val_check_interval. If less than val_check_interval it will set val_check_interval to be max_steps - 1
Default: -1
optimizer string
The supported optimizers that are configurable for customization. Cosine Annealing LR scheduler will start at min_learning_rate and move towards learning_rate over warmup_steps. Note: For models listed as NeMo checkpoint type, the only Adam implementation is Fused AdamW.
Allowed values:
adam_with_cosine_annealingadam_with_flat_lradamw_with_cosine_annealingadamw_with_flat_lr
Default: adamw_with_cosine_annealing
adam_beta1 number
Controls the exponential decay rate for the moving average of past gradients (momentum), only used with cosine_annealing learning rate schedulers
Default: 0.9
adam_beta2 number
Controls the decay rate for the moving average of past squared gradients (adaptive learning rate scaling), only used with cosine_annealing learning rate schedulers
Default: 0.99
min_learning_rate number
Starting point for learning_rate scheduling, only used with cosine_annealing learning rate schedulers. Must be lower than learning_rate if provided. If not provided, or 0, this will default to 0.1 * learning_rate.
batch_size integer
Batch size is the number of training samples used to train a single forward and backward pass. This is related to the gradient_accumulation_steps in HF documentation where gradient_accumulation_steps = batch_size // micro_batch_size The default batch size for DPO when not provided is 16 For GRPO this parameter is ignored, the training_batch size is calculated as num_prompts_per_step * num_generations_per_prompt.
Default: 8
epochs integer
Epochs is the number of complete passes through the training dataset. Default for DPO when not provided is 1
Default: 50
learning_rate number
How much to adjust the model parameters in response to the loss gradient. Default for DPO when not provided is 9e-06
Default: 0.0001
log_every_n_steps integer
Control logging frequency for metrics tracking. It may slow down training to log on every single batch. By default, logs every 10 training steps. This parameter is log_frequency in HF
Default: 10
val_check_interval number
Control how often to check the validation set, and how often to check for best checkpoint. You can check after a fixed number of training batches by passing an integer value. You can pass a float in the range [0.1, 1.0] to check after a fraction of the training epoch. If the best checkpoint is found after validation, it will be saved at that time temporarily, it is currently only uploaded at the end of the training run. Note: Early Stopping monitors the validation loss and stops the training when no improvement is observed after 10 epochs with a minimum delta of 0.001. If val_check_interval is greater than the number of training batches, validation will run every epoch.
weight_decay number
An additional penalty term added to the gradient descent to keep weights low and mitigate overfitting.
sft object
SFT specific parameters
Properties
hidden_dropout number
Dropout probability applied to the hidden states in transformer layers. Randomly zeros a fraction of hidden state activations during training to prevent overfitting. Typical values range from 0.0 (no dropout) to 0.1. Set to None to use model defaults. Higher values increase regularization but may slow convergence.
attention_dropout number
Dropout probability applied to attention weights in the self-attention mechanism. Randomly zeros a fraction of attention scores during training to improve generalization. Typical values range from 0.0 (no dropout) to 0.1. Set to None to use model defaults. Higher values can help prevent the model from over-relying on specific token relationships.
dpo object
DPO specific parameters
Properties
max_grad_norm number
Maximum gradient norm for gradient clipping during training. Prevents exploding gradients by scaling down gradients that exceed this threshold. Lower this value (e.g., 0.5) if you observe training instability, NaN losses, or erratic loss spikes. Increase it (e.g., 5.0) if training seems overly conservative or progress is too slow. Typical values range from 0.5 to 5.0.
Default: 1.0
ref_policy_kl_penalty number
Controls how strongly the trained policy is penalized for deviating from the reference policy. Increasing this value encourages the policy to stay closer to the reference (more conservative learning), while decreasing it allows more freedom to explore user-preferred behavior. Parameter is called `beta` in the original paper
Default: 0.05
preference_loss_weight number
Scales the contribution of the preference loss to the overall training objective. Increasing this value emphasizes learning from preference comparisons more strongly.
Default: 1
preference_average_log_probs boolean
If set to true, the preference loss uses average log-probabilities, making the loss less sensitive to sequence length. Setting it to false (default) uses total log-probabilities, giving more influence to longer sequences.
Default: False
sft_loss_weight number
Scales the contribution of the supervised fine-tuning loss. Setting this to 0 disables SFT entirely, allowing training to focus exclusively on preference-based optimization.
Default: 0
sft_average_log_probs boolean
If set to true, the supervised fine-tuning (SFT) loss normalizes by sequence length, treating all examples equally regardless of length. If false (default), longer examples contribute more to the loss.
Default: False
grpo object
GRPO specific parameters
Properties
environment * object | object | object | object | object | object
Task specific environment configuration defining the training context including dataset specification and reward function.
Discriminator: property: name
One of:
Option 1: object
Option 2: object
Option 3: object
Option 4: object
Option 5: object
Option 6: object
normalize_rewards boolean
Normalize advantages by dividing by their standard deviation across responses to each prompt. Default is True for improved training stability and consistent gradient magnitudes regardless of reward scale variations. This prevents prompts with high reward variance from dominating updates. Disable (False) only if: (1) rewards are already well-scaled and consistent, (2) you want to preserve reward magnitude information where higher-value tasks should have stronger learning signals, or (3) using very few generations per prompt (<4) where standard deviation estimates are noisy. Recommended: keep enabled for most use cases.
Default: True
use_rloo boolean
Use leave-one-out baseline (Reinforcement, Leave One Out) for computing advantages. When True, each sample's baseline excludes its own reward, providing an unbiased estimate of expected reward. Default is True as it's theoretically correct and works well with typical num_generations_per_prompt values (4-8). Disable (False) for: (1) very few generations per prompt (≤3) where leave-one-out baselines become too noisy, (2) faster training by avoiding per-sample baseline computation, or (3) replicating original GRPO paper. The tradeoff: True gives unbiased but higher variance estimates; False gives biased but lower variance, which can improve stability with small generation counts.
Default: True
overlong_filtering boolean
Exclude truncated sequences (those that hit max_total_sequence_length without producing end-of-text) from loss computation. Truncated samples still contribute to advantage baseline calculations but don't receive gradient updates. Enable (True) for long-form tasks like mathematical proofs or extended reasoning where correct answers may legitimately exceed length limits and shouldn't be penalized for incompleteness. Default is False to maintain standard GRPO behavior where the model learns to complete responses within sequence limits, which is appropriate for most tasks and production systems with length constraints.
Default: False
num_prompts_per_step integer
Number of unique prompts to process per training step. This controls the batch size for sampling prompts from the dataset. Total samples per step = num_prompts_per_step * num_generations_per_prompt. Increase for better gradient estimates and training stability (at the cost of memory). Typical values: 8-64 depending on available GPU memory.
Default: 1
num_generations_per_prompt integer
Number of responses to generate for each prompt. Used to compute the advantage baseline by comparing multiple responses to the same prompt. Higher values (e.g., 4-8) provide better advantage estimates but increase computational cost. Typical range: 4-16.
Default: 1
ref_policy_kl_penalty number
KL divergence penalty coefficient (β) that controls how strongly the trained policy is penalized for deviating from the reference policy. Higher values (e.g., 0.05-0.1) encourage the policy to stay closer to the reference (more conservative learning), while lower values (e.g., 0.001-0.01) allow more freedom to explore user-preferred behavior. Typical range: 0.001-0.1. Also known as 'beta' in the original GRPO paper and 'kl_penalty_coefficient' in some implementations.
Default: 0.01
max_grad_norm number
Maximum gradient norm for gradient clipping during training. Prevents exploding gradients by scaling down gradients that exceed this threshold. Lower this value (e.g., 0.5) if you observe training instability, NaN losses, or erratic loss spikes. Increase it (e.g., 5.0) if training seems overly conservative or progress is too slow. Typical values range from 0.5 to 5.0.
Default: 1.0
ratio_clip_min number
Lower bound for clipping the policy update ratio in GRPO loss. Limits how much the policy can change per update, preventing instability. The policy ratio is clipped to stay within [1-epsilon, 1+epsilon]. Standard value: 0.2 (clips to [0.8, 1.2]). Lower values (e.g., 0.1) make training more conservative; higher values (e.g., 0.3) allow larger updates. Typically set equal to ratio_clip_max for symmetric clipping.
Default: 0.2
ratio_clip_max number
Upper bound for clipping the policy update ratio in GRPO loss. Limits how much the policy can change per update, preventing instability. Standard value: 0.2 (clips to [0.8, 1.2]). Usually set equal to ratio_clip_min (symmetric clipping), but can differ for asymmetric clipping strategies where you want to limit increases differently than decreases.
Default: 0.2
ratio_clip_c number
Dual-clipping parameter that adds extra protection against large policy updates when rewards are negative. Must be greater than 1 (typically 3). Set to None to disable. This helps prevent the policy from changing too aggressively on poor-performing samples.
use_on_policy_kl_approximation boolean
Use importance-weighted KL divergence estimation between current and reference policies. This provides a more accurate, always-positive estimate of how much the policy has changed by accounting for the difference between the policy used for sampling and the current policy being trained. Enable when you need precise KL tracking. Default: False for efficiency.
Default: False
use_importance_sampling_correction boolean
Correct for numerical differences between the inference backend (used for generation) and training framework (used for learning). This accounts for precision differences, backend variations, etc. that can cause the same model to produce slightly different probabilities. Recommended for async GRPO and when using FP8 inference.
Default: False
token_level_loss boolean
Whether to compute loss at token level (True) or sequence level (False). Token-level averages over all tokens; sequence-level averages per-sequence losses. Sequence-level is used for GSPO-style training.
Default: True
logprob_chunk_size integer
Chunk size for processing logprobs in distributed settings. Larger values improve efficiency but require more memory. Used for chunked distributed operations during loss computation.
Default: 1024
generation_batch_size integer
Batch size for generation during rollouts. Controls how many sequences are generated in parallel.
Default: 32
generation_temperature number
Sampling temperature for generation. Higher values (e.g., 1.0) increase randomness, lower values (e.g., 0.1) make output more deterministic. Temperature of 0 is equivalent to greedy sampling (always selecting the most likely token).
Constraints: minimum: 0.0, maximum: 1.0
Default: 1.0
generation_top_p number
Nucleus sampling parameter (top-p). Only tokens with cumulative probability >= top_p are considered. 1.0 means no filtering; lower values (e.g., 0.9) increase quality by filtering unlikely tokens.
Constraints: minimum: 0.0, maximum: 1.0
Default: 0.9
generation_top_k integer
Top-k sampling parameter. Only the k most likely tokens are considered at each step. None means no top-k filtering is applied. Typically used with values like 50 to balance diversity and quality.
Default: 50
generation_pipeline_parallel_size integer
Number of GPUs to use for pipeline parallelism during generation, splitting model layers across devices (inter-layer parallelism).
Default: 1
lora object
LoRa specific parameters
Properties
adapter_dim integer
Size of adapter layers added throughout the model. This is the size of the tunable layers that LoRA adds to various transformer blocks in the base model. This parameter is a power of 2.
Default: 8
alpha integer
Scaling factor for the LoRA update. Controls the magnitude of the low-rank approximation. A higher alpha value increases the impact of the LoRA weights, effectively amplifying the changes made to the original model. Proper tuning of alpha is essential, as it balances the adaptation's impact, ensuring neither underfitting nor overfitting. This is often a multiple of Adapter Dimension
Default: 16
adapter_dropout number
Dropout probability in the adapter layer.
target_modules array
Target specific layers in the model architecture to apply LoRA. We select a subset of the layers by default. However, specific layers can also be selected. For example: - `linear_qkv`: Apply LoRA to the fused linear layer used for query, key, and value projections in self-attention. - `linear_proj`: Apply LoRA to the linear layer used for projecting the output of self-attention. - `linear_fc1`: Apply LoRA to the first fully-connected layer in MLP. - `linear_fc2`: Apply LoRA to the second fully-connected layer in MLP. - `*_proj`: Apply LoRA to all layers used for projecting the output of self-attention. Target modules can also contain wildcards. For example, you can specify `target_modules=['*.layers.0.*.linear_qkv', '*.layers.1.*.linear_qkv']` to add LoRA to only linear_qkv on the first two layers. Our framework only supports a Fused LoRA implementation, Cannonical LoRA is not supported.
Array items:
item string
distillation object
Knowledge Distillation specific parameters
Properties
teacher * string
Target to be used as teacher for distillation.
sequence_packing_enabled boolean
Sequence packing can improve speed of training by letting the training work on multiple rows at the same time. Experimental and not supported by all models. If a model is not supported, a warning will be returned in the response body and training will proceed with sequence packing disabled. Not recommended for produciton use. This flag may be removed in the future. See https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/packed_sequence.html for more details.
Default: False
output_model string
The output model. If not specified, no output model is created, only the artifact files written.
dataset * string | object
The dataset to be used for customization.
Any of:
Option 1: string - A reference to Dataset.
Option 2: object
dataset_parameters object
Additional parameters to configure a dataset
Allows additional properties: No
Properties
tools array
A list of tools that are available for training with tool calling
Array items:
item object
Allows additional properties: No
Properties
type string
Type of tool - currently only 'function' is supported
Default: function
function * object
Schema defining the function
Allows additional properties: No
Properties
name * string
Name of the function.
description * string
Description of what the function does.
parameters * object
Parameters schema for the function.
Allows additional properties: Yes
Properties
type string
Type of parameters - currently only 'object' is supported.
Default: object
properties object
Dictionary of parameter names to their type definitions.
Additional properties schema:
[key: string] object
Allows additional properties: Yes
Properties
type * string
The type of the parameter provided.
description string
The description of this parameter.
required array
List of required parameter names.
Array items:
item string
additionalProperties boolean
Additional properties are allowed.
Default: True
required array
Required parameters for the function
Array items:
item string
strict boolean
Whether the verification is in strict mode.
Default: False
num_hard_negatives integer
Number of negative documents to include per query for contrastive training. - Embedding Only
negative_sample_strategy string
How to select negatives when more are available than needed - Embedding Only. 'first' picks the first N; 'random' samples N negatives randomly.
in_batch_negatives boolean
In-batch negatives treats every other example in a training batch as a negative sample during contrastive learning. When enabled, the model learns to distinguish the correct positive pair not just from explicitly provided hard negatives, but from all other examples in the same batch. This can improve training without adding extra labeled negative data.
Default: False
ownership object
Ownership information for the entity
Properties
created_by string
The ID of the user that created this entity.
Default:
updated_by string
The ID of the user that last updated this entity.
access_policies object
A general object for capturing access policies which can be used by an external service to determine ACLs
Default: {}
Additional properties schema:
[key: string] string
integrations array
A list of third party integrations for a job. Example: Weights & Biases integration.
Examples:[{'type': 'wandb', 'wandb': {'entity': 'my-entity', 'notes': 'experiment with long context', 'project': 'my-project', 'tags': ['long-context', 'reasoning']}}]
Array items:
item object
Discriminator: property: type
One of:
Option 1: object - WandB integration configuration
custom_fields object
A set of custom fields that the user can define and use for various purposes.
Additional properties schema:
[key: string] string

Weights & Biases Integration#

To enable W&B integration, add the wandb_api_key and integrations settings to the customization job creation request as shown in the example above.

Available Weights & Biases (W&B) settings are mapped to the Weights & Biases Python SDK settings.

Note

W&B integration is optional. You can create jobs without W&B by omitting the API key header.

W&B Configuration Priority#

W&B settings are applied with the following priority (highest priority overrides lower):

  1. Lowest Priority - Job Fields:

    • job.project → sets W&B project name

    • job.description → sets W&B notes

  2. Medium Priority - Application Deployment Settings:

    • Deployment-specific settings from Helm chart values.yaml under wandb section

  3. Highest Priority - Job Integrations:

    • Settings from job.integrations array with type: "wandb"

    • These settings will override all other W&B configurations