nat.plugins.customizer.dpo.register#
Registration module for DPO components.
This module registers the DPO trajectory builder and NeMo Customizer trainer adapter
with NAT’s finetuning harness:
- _type: dpo_traj_builder - DPO Trajectory Builder
- _type: nemo_customizer_trainer_adapter - NeMo Customizer TrainerAdapter
Functions#
|
Register the DPO (Direct Preference Optimization) trajectory builder. |
|
Register the NeMo Customizer trainer adapter. |
|
Register the NeMo Customizer trainer. |
Module Contents#
- async dpo_trajectory_builder(
- config: nat.plugins.customizer.dpo.config.DPOTrajectoryBuilderConfig,
- builder: nat.builder.builder.Builder,
Register the DPO (Direct Preference Optimization) trajectory builder.
This builder collects preference data from workflows that produce scored candidate intermediate steps (TTC_END events with TTCEventData).
The builder: 1. Runs evaluation to collect intermediate steps 2. Filters for TTC_END steps with the configured name 3. Groups candidates by turn_id 4. Generates preference pairs based on score differences 5. Builds trajectories with DPOItem episodes
Example YAML configuration:
trajectory_builders: dpo_builder: _type: dpo_traj_builder ttc_step_name: dpo_candidate_move exhaustive_pairs: true min_score_diff: 0.05 max_pairs_per_turn: 5 finetuning: enabled: true trajectory_builder: dpo_builder # ... other finetuning config
- Args:
config: The trajectory builder configuration. builder: The NAT workflow builder (for accessing other components).
- Yields:
A configured DPOTrajectoryBuilder instance.
- async nemo_customizer_trainer_adapter(
- config: nat.plugins.customizer.dpo.config.NeMoCustomizerTrainerAdapterConfig,
- builder: nat.builder.builder.Builder,
Register the NeMo Customizer trainer adapter.
This adapter submits DPO/SFT training jobs to NeMo Customizer and optionally deploys the trained model.
The adapter: 1. Converts trajectories to JSONL format for DPO training 2. Uploads datasets to NeMo Datastore 3. Submits customization jobs to NeMo Customizer 4. Monitors job progress and status 5. Optionally deploys trained models
Example YAML configuration:
trainer_adapters: nemo_customizer: _type: nemo_customizer_trainer_adapter entity_host: https://nmp.example.com datastore_host: https://datastore.example.com namespace: my-project customization_config: meta/llama-3.2-1b-instruct@v1.0.0+A100 hyperparameters: training_type: dpo epochs: 5 batch_size: 8 use_full_message_history: true deploy_on_completion: true finetuning: enabled: true trainer_adapter: nemo_customizer # ... other finetuning config
- Args:
config: The trainer adapter configuration. builder: The NAT workflow builder (for accessing other components).
- Yields:
A configured NeMoCustomizerTrainerAdapter instance.
- async nemo_customizer_trainer(
- config: nat.plugins.customizer.dpo.config.NeMoCustomizerTrainerConfig,
- builder: nat.builder.builder.Builder,
Register the NeMo Customizer trainer.
This trainer orchestrates DPO data collection and training job submission. Unlike epoch-based trainers, it: 1. Runs the trajectory builder multiple times (num_runs) to collect data 2. Aggregates all trajectories into a single dataset 3. Submits the dataset to NeMo Customizer for training 4. Monitors the training job until completion
Example YAML configuration:
trainers: nemo_dpo: _type: nemo_customizer_trainer num_runs: 5 wait_for_completion: true deduplicate_pairs: true max_pairs: 10000 finetuning: enabled: true trainer: nemo_dpo # ... other finetuning config
- Args:
config: The trainer configuration. builder: The NAT workflow builder (for accessing other components).
- Yields:
A configured NeMoCustomizerTrainer instance.