nemo_microservices.types.customization.customization_status_details#
Module Contents#
Classes#
API#
- class nemo_microservices.types.customization.customization_status_details.CustomizationStatusDetails(/, **data: typing.Any)#
Bases:
nemo_microservices._models.BaseModel- best_epoch: Optional[int]#
None
The epoch completed of the best checkpoint for the customized model
- created_at: datetime.datetime#
None
The time when training started.
- elapsed_time: Optional[float]#
None
Time in seconds that the job has been running/took to run.
- epochs_completed: Optional[int]#
None
The total number of epochs completed during training
- metrics: Optional[nemo_microservices.types.customization.customization_metrics.CustomizationMetrics]#
None
- percentage_done: Optional[float]#
None
Percentage tracking the training progress of the customization.
The progress is calculated as the percentage of completed epochs divided by the total number of epochs multiplied by 100. The training progress may not be 100 after training completes due to early stopping (validation loss did not improve significantly over time) or job time limit was reached.
- status: nemo_microservices.types.shared.job_status.JobStatus#
None
Normalized statuses for all jobs.
CREATED: The job is created, but not yet scheduled.
PENDING: The job is waiting for resource allocation.
RUNNING: The job is currently running.
CANCELLING: The job is being cancelled at the user’s request.
CANCELLED: The job has been cancelled by the user.
CANCELLING: The job is being cancelled at the user’s request.
FAILED: The job failed to execute and terminated.
COMPLETED: The job has completed successfully.
READY: The job is ready to be used.
UNKNOWN: The job status is unknown.
- status_logs: Optional[List[nemo_microservices.types.customization.status_log.StatusLog]]#
None
Detailed log for changes to the status of the customization job.
- steps_completed: Optional[int]#
None
The number of steps completed during training.
The total number of steps is determined by hyperparameters, number of epochs and the batch size, and the number of samples in the training dataset.
total_steps = epochs * ceil(training_samples / batch_size)when both training and validation datasets are used, ortotal_steps = epochs * ceil(ceil(training_samples * 0.9) / batch_size)when only training dataset is used.
- steps_per_epoch: Optional[int]#
None
The number of steps per epoch.
Calculated as follows:
steps_per_epoch = ceil(training_samples / batch_size) / epochs. Ifnull, then Customizer simply doesn’t know the value.
- train_loss: Optional[float]#
None
The training loss of the best checkpoint for the customized model
- updated_at: datetime.datetime#
None
The last time the status was updated.
- val_loss: Optional[float]#
None
The validation loss of the best checkpoint for the customized model