Customization Config Reference#

Tip

Looking for a step-by-step guide? Check out Create Customization Config.

For a complete reference of all customization configuration parameters with constraints and types:

CustomizationConfigInput object

A customization configuration template supported by the Customizer.

Properties

name string

The name of the entity. Must be unique inside the namespace. If not specified, it will be the same as the automatically generated id.

Constraints: max length: 255, pattern: ^[\w\-\+.@:]*$

Default:

namespace string

The namespace of the entity. This can be missing for namespace entities or in deployments that don't use namespaces.

Default: default

description string

The description of the entity.

target string | object

The target to perform the customization on

Any of:

Option 1: string - A reference to CustomizationTarget.

Option 2: object - Optional model_path

training_options * array

Resource configuration for each training option for the model.

Array items:

item object

Resource configuration for model training. Specifies the hardware and parallelization settings for training.

Properties

training_type * string

Allowed values:

dposftdistillation

finetuning_type * string

Allowed values:

loralora_mergedall_weights

num_gpus * integer

The number of GPUs per node to use for the specified training

num_nodes integer

The number of nodes to use for the specified training

Default: 1

tensor_parallel_size integer

Number of GPUs used to split individual layers for tensor model parallelism (intra-layer).

Default: 1

data_parallel_size integer

Number of model replicas that process different data batches in parallel, with gradient synchronization across GPUs. Only available on HF checkpoint models. data_parallel_size must be equal num_gpus * num_nodes and is set to this value automatically if not provided.

pipeline_parallel_size integer

Number of GPUs used to split the model across layers for pipeline model parallelism (inter-layer). Only available on NeMo 2 checkpoint models. pipeline_parallel_size * tensor_parallel_size must equal num_gpus * num_nodes

Default: 1

expert_model_parallel_size integer

Number of GPUs used to parallelize expert (MoE) components of the model. This controls distribution of expert computation across devices for models that use Mixture-of-Experts. If omitted (null), expert parallelism will not be enabled/assumed by default.Setting for models that do not use MoE can cause failures during training.

use_sequence_parallel boolean

If set, sequences are distributed over multiple GPUs

Default: False

micro_batch_size * integer

The number of examples per data-parallel rank. More details at: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/batching.html

training_precision string

The precision to train the model with, defaults to the target's precision

Allowed values:

int8bf16fp16fp32fp8-mixedbf16-mixed

max_seq_length * integer

The largest context used for training. Datasets are truncated based on the maximum sequence length.

pod_spec object

Additional parameters to ensure these training jobs get run on the appropriate hardware.

Examples:

{'annotations': {'nmp/job-type': 'customization'}, 'node_affinity': {'preferredDuringSchedulingIgnoredDuringExecution': [{'preference': {'matchExpressions': [{'key': 'nvidia.com/gpu.count', 'operator': 'Gt', 'values': ['4']}]}, 'weight': 100}, {'preference': {'matchExpressions': [{'key': 'topology.kubernetes.io/zone', 'operator': 'In', 'values': ['us-west-2a', 'us-west-2b']}]}, 'weight': 50}], 'requiredDuringSchedulingIgnoredDuringExecution': {'nodeSelectorTerms': [{'matchExpressions': [{'key': 'nvidia.com/gpu.product', 'operator': 'In', 'values': ['NVIDIA-A100-SXM4-80GB', 'NVIDIA-H100-80GB-HBM3']}, {'key': 'node.kubernetes.io/instance-type', 'operator': 'In', 'values': ['p4d.24xlarge', 'p5.48xlarge']}]}]}}, 'node_selectors': {'kubernetes.io/hostname': 'minikube'}, 'tolerations': [{'effect': 'NoSchedule', 'key': 'app', 'operator': 'Equal', 'value': 'customizer'}]}

Properties

node_selectors object

Additional arguments for node selector

Additional properties schema:

[key: string] string

annotations object

Additional arguments for annotations

Additional properties schema:

[key: string] string

tolerations array

Additional arguments for tolerations

Array items:

item object

Properties

key string

Taint key that the toleration applies to

operator string

Operator: "Exists" or "Equal"

Default: Equal

value string

Value to match

effect string

Taint effect to match: "NoSchedule", "PreferNoSchedule", or "NoExecute"

tolerationSeconds integer

Only for NoExecute; how long the toleration lasts

node_affinity object

The kubernentes node affinity to apply to the training pods

Properties

requiredDuringSchedulingIgnoredDuringExecution object

Properties

nodeSelectorTerms * array

Array items:

item object

Properties

matchExpressions array

Array items:

item object

Properties

key * string

operator * string

Allowed values:

InNotInExistsDoesNotExistGtLt

values array

Array items:

item string

preferredDuringSchedulingIgnoredDuringExecution array

Array items:

item object

Properties

weight * integer

preference * object

Properties

matchExpressions array

Array items:

item object

Properties

key * string

operator * string

Allowed values:

InNotInExistsDoesNotExistGtLt

values array

Array items:

item string

prompt_template string

Prompt template used to extract keys from the dataset. E.g. prompt_template='{input} {output}', and sample looks like '{\"input\": \"Q: 2x2 A:\", \"output\": \"4\"}' then the model sees 'Q: 2x2 A: 4'. This parameter is only used for the "SFT" and "Distillation" Training Types on non embeddding models.

Default: {prompt} {completion}

chat_prompt_template string

Chat Prompt Template to apply to the model to make it compatible with chat datasets, or to train it on a different template for your use case. This parameter is only used for the "SFT" and "Distillation" Training Types on non embedding models.

dataset_schemas array

JSON Schema used for validating datasets that can be used with the configured finetuning jobs.

Array items:

item object

Allows additional properties: Yes

project string

The URN of the project associated with this entity.

custom_fields object

A set of custom fields that the user can define and use for various purposes.

Allows additional properties: Yes

ownership object

Ownership information for the entity

Properties

created_by string

The ID of the user that created this entity.

Default:

updated_by string

The ID of the user that last updated this entity.

access_policies object

A general object for capturing access policies which can be used by an external service to determine ACLs

Default: {}

Additional properties schema:

[key: string] string