nat.finetuning.interfaces.trainer_adapter#

Classes#

Adapter to send Trajectories to remote training cluster for weights updates.

Adapter to send Trajectories to remote training cluster for weights updates.

async initialize( run_config: nat.data_models.finetuning.FinetuneConfig, ) → None#: Asynchronously initialize any resources needed for the trainer adapter.

abstractmethod is_healthy() → bool#

Check the health of the remote training backend.

Submit trajectories to remote training backend.

Get the status of a submitted training job.

Wait until the training job is complete.

Args:: ref (TrainingJobRef): Reference to the training job. poll_interval (float): Time in seconds between status checks.
Returns:: TrainingJobStatus: The final status of the training job.

abstractmethod log_progress( ref: nat.data_models.finetuning.TrainingJobRef, metrics: dict[str, Any], output_dir: str | None = None, ) → None#

Log training adapter progress.

Args:: ref: Training job reference metrics: Dictionary of metrics to log output_dir: Optional output directory override