nat.plugins.openpipe.trainer#
Attributes#
Classes#
Concrete implementation of Trainer for the OpenPipe ART backend. |
Module Contents#
- MATPLOTLIB_AVAILABLE = True#
- logger#
- class ARTTrainer(
- trainer_config: nat.plugins.openpipe.config.ARTTrainerConfig,
- \*\*kwargs,
Bases:
nat.finetuning.interfaces.finetuning_runner.TrainerConcrete implementation of Trainer for the OpenPipe ART backend.
This runner orchestrates the finetuning process using: - ARTTrajectoryBuilder to collect trajectories from evaluations - ARTTrainerAdapter to submit trajectories to the ART training backend
Initialize the OpenPipe ART Runner.
- Args:
trainer_config: Configuration for the ART trainer backend
- trainer_config: nat.plugins.openpipe.config.ARTTrainerConfig#
- _job_refs: list[nat.data_models.finetuning.TrainingJobRef] = []#
- async initialize(
- run_config: nat.data_models.finetuning.FinetuneConfig,
Initialize the runner and its components.
This will: - Initialize the TrainerAdapter and verify connectivity - Prepare the TrajectoryBuilder for collecting trajectories
- async run_epoch( ) nat.data_models.finetuning.TrainingJobRef | None#
Run a single epoch of training.
- Args:
epoch: The current epoch number (0-indexed) run_id: Unique identifier for this training run
- Returns:
TrainingJobRef: Reference to the submitted training job
- async run(
- num_epochs: int,
Run the complete finetuning workflow for the specified number of epochs.
- Args:
num_epochs: Number of epochs to train
- Returns:
list[TrainingJobStatus]: Status of all training jobs
- async get_metrics(run_id: str) dict[str, Any]#
Get training metrics for a specific run.
- Args:
run_id: The run identifier
- Returns:
dict: Metrics from the training run
- log_progress( ) None#
Log training progress and create visualizations.
- Args:
epoch: Current epoch number metrics: Dictionary of metrics to log output_dir: Optional output directory override
- apply_curriculum_learning(
- trajectory_collection: nat.data_models.finetuning.TrajectoryCollection,
- epoch: int,
Apply curriculum learning to filter trajectory groups based on difficulty.
This method: 1. Sorts trajectory groups by average reward (difficulty) 2. Filters out groups with no reward variance (no learning signal) 3. Selects appropriate groups based on curriculum progression 4. Expands curriculum at specified intervals
- Args:
trajectory_collection: The complete collection of trajectories epoch: Current epoch number
- Returns:
TrajectoryCollection: Filtered trajectories for training
- _create_reward_plot(epoch: int, output_dir: pathlib.Path) None#
Create PNG plot showing reward progression and curriculum learning status.