nat.plugins.openpipe.trainer#

Attributes#

`MATPLOTLIB_AVAILABLE`
`logger`

Classes#

ARTTrainer

Concrete implementation of Trainer for the OpenPipe ART backend.

Module Contents#

MATPLOTLIB_AVAILABLE = True#

logger#

class ARTTrainer( trainer_config: nat.plugins.openpipe.config.ARTTrainerConfig, \*\*kwargs, )#

Bases: nat.finetuning.interfaces.finetuning_runner.Trainer

Concrete implementation of Trainer for the OpenPipe ART backend.

This runner orchestrates the finetuning process using: - ARTTrajectoryBuilder to collect trajectories from evaluations - ARTTrainerAdapter to submit trajectories to the ART training backend

Initialize the OpenPipe ART Runner.

Args:: trainer_config: Configuration for the ART trainer backend

trainer_config: nat.plugins.openpipe.config.ARTTrainerConfig#

_job_refs: list[nat.data_models.finetuning.TrainingJobRef] = []#

_run_id: str | None = None#

_reward_history: list[dict] = []#

_validation_history: list[dict] = []#

async initialize( run_config: nat.data_models.finetuning.FinetuneConfig, ) → None#

Initialize the runner and its components.

This will: - Initialize the TrainerAdapter and verify connectivity - Prepare the TrajectoryBuilder for collecting trajectories

async run_epoch( epoch: int, run_id: str, ) → nat.data_models.finetuning.TrainingJobRef | None#

Run a single epoch of training.

Args:: epoch: The current epoch number (0-indexed) run_id: Unique identifier for this training run
Returns:: TrainingJobRef: Reference to the submitted training job

async run( num_epochs: int, ) → list[nat.data_models.finetuning.TrainingJobStatus]#

Run the complete finetuning workflow for the specified number of epochs.

Args:: num_epochs: Number of epochs to train
Returns:: list[TrainingJobStatus]: Status of all training jobs

async get_metrics(run_id: str) → dict[str, Any]#

Get training metrics for a specific run.

Args:: run_id: The run identifier
Returns:: dict: Metrics from the training run

async cleanup() → None#: Clean up any resources used by the runner.

log_progress( epoch: int, metrics: dict[str, Any], output_dir: str | None = None, ) → None#

Log training progress and create visualizations.

Args:: epoch: Current epoch number metrics: Dictionary of metrics to log output_dir: Optional output directory override

apply_curriculum_learning( trajectory_collection: nat.data_models.finetuning.TrajectoryCollection, epoch: int, ) → nat.data_models.finetuning.TrajectoryCollection#

Apply curriculum learning to filter trajectory groups based on difficulty.

This method: 1. Sorts trajectory groups by average reward (difficulty) 2. Filters out groups with no reward variance (no learning signal) 3. Selects appropriate groups based on curriculum progression 4. Expands curriculum at specified intervals

Args:: trajectory_collection: The complete collection of trajectories epoch: Current epoch number
Returns:: TrajectoryCollection: Filtered trajectories for training

_create_reward_plot(epoch: int, output_dir: pathlib.Path) → None#: Create PNG plot showing reward progression and curriculum learning status.

_log_metrics_to_file( epoch: int, metrics: dict[str, Any], output_dir: pathlib.Path, ) → None#: Log metrics to JSON file.