nat.plugins.customizer.dpo.config#

Configuration classes for DPO training with NeMo Customizer.

This module provides configuration for: 1. DPO Trajectory Builder - collecting preference data from workflows 2. NeMo Customizer TrainerAdapter - submitting DPO training jobs

Classes#

DPOTrajectoryBuilderConfig

Configuration for the DPO (Direct Preference Optimization) Trajectory Builder.

NeMoCustomizerTrainerConfig

Configuration for the NeMo Customizer Trainer.

DPOSpecificHyperparameters

DPO-specific hyperparameters for NeMo Customizer.

NeMoCustomizerHyperparameters

Hyperparameters for NeMo Customizer training jobs.

NIMDeploymentConfig

Configuration for NIM deployment after training.

NeMoCustomizerTrainerAdapterConfig

Configuration for the NeMo Customizer TrainerAdapter.

Module Contents#

class DPOTrajectoryBuilderConfig(/, **data: Any)#

Bases: nat.data_models.finetuning.TrajectoryBuilderConfig

Configuration for the DPO (Direct Preference Optimization) Trajectory Builder.

This builder collects preference pairs from workflows that produce TTC_END intermediate steps with TTCEventData. It uses the structured TTCEventData model to extract turn_id, candidate_index, score, input (prompt), and output (response) - no dictionary key configuration needed.

The builder groups candidates by turn_id and creates preference pairs based on score differences.

Example YAML configuration:

trajectory_builders:
  dpo_builder:
    _type: dpo_traj_builder
    ttc_step_name: dpo_candidate_move
    exhaustive_pairs: true
    min_score_diff: 0.05
    max_pairs_per_turn: 5

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

ttc_step_name: str = None#
exhaustive_pairs: bool = None#
min_score_diff: float = None#
max_pairs_per_turn: int | None = None#
reward_from_score_diff: bool = None#
require_multiple_candidates: bool = None#
validate_config() DPOTrajectoryBuilderConfig#

Validate configuration consistency.

class NeMoCustomizerTrainerConfig(/, **data: Any)#

Bases: nat.data_models.finetuning.TrainerConfig

Configuration for the NeMo Customizer Trainer.

This trainer orchestrates DPO data collection and training job submission. Unlike epoch-based trainers, it runs the trajectory builder multiple times to collect data, then submits a single training job to NeMo Customizer.

Example YAML configuration:

trainers:
  nemo_dpo:
    _type: nemo_customizer_trainer
    num_runs: 5
    wait_for_completion: true
    deduplicate_pairs: true
    max_pairs: 10000

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

num_runs: int = None#
continue_on_collection_error: bool = None#
deduplicate_pairs: bool = None#
max_pairs: int | None = None#
wait_for_completion: bool = None#
class DPOSpecificHyperparameters(/, **data: Any)#

Bases: pydantic.BaseModel

DPO-specific hyperparameters for NeMo Customizer.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

ref_policy_kl_penalty: float = None#
preference_loss_weight: float = None#
preference_average_log_probs: bool = None#
sft_loss_weight: float = None#
class NeMoCustomizerHyperparameters(/, **data: Any)#

Bases: pydantic.BaseModel

Hyperparameters for NeMo Customizer training jobs.

These map to the hyperparameters argument in client.customization.jobs.create().

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

training_type: Literal['sft', 'dpo'] = None#
finetuning_type: Literal['lora', 'all_weights'] = None#
epochs: int = None#
batch_size: int = None#
learning_rate: float = None#
dpo: DPOSpecificHyperparameters = None#
class NIMDeploymentConfig(/, **data: Any)#

Bases: pydantic.BaseModel

Configuration for NIM deployment after training.

These settings are used when deploy_on_completion is True.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

image_name: str = None#
image_tag: str = None#
gpu: int = None#
deployment_name: str | None = None#
description: str = None#
class NeMoCustomizerTrainerAdapterConfig(/, **data: Any)#

Bases: nat.data_models.finetuning.TrainerAdapterConfig

Configuration for the NeMo Customizer TrainerAdapter.

This adapter submits DPO/SFT training jobs to NeMo Customizer and optionally deploys the trained model.

Example YAML configuration:

trainer_adapters:
  nemo_customizer:
    _type: nemo_customizer_trainer_adapter
    entity_host: https://nmp.example.com
    datastore_host: https://datastore.example.com
    namespace: my-project
    customization_config: meta/llama-3.2-1b-instruct@v1.0.0+A100
    hyperparameters:
      training_type: dpo
      epochs: 5
      batch_size: 8
    use_full_message_history: true
    deploy_on_completion: true

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

entity_host: str = None#
datastore_host: str = None#
hf_token: str = None#
namespace: str = None#
dataset_name: str = None#
dataset_output_dir: str | None = None#
create_namespace_if_missing: bool = None#
customization_config: str = None#
hyperparameters: NeMoCustomizerHyperparameters = None#
use_full_message_history: bool = None#
deploy_on_completion: bool = None#
deployment_config: NIMDeploymentConfig = None#
poll_interval_seconds: float = None#
deployment_timeout_seconds: float = None#
max_consecutive_status_failures: int = None#
validate_config() NeMoCustomizerTrainerAdapterConfig#

Validate configuration consistency.