nemo_microservices.types.customization.customization_status_details#

Module Contents#

Classes#

API#

class nemo_microservices.types.customization.customization_status_details.CustomizationStatusDetails(/, **data: typing.Any)#

Bases: nemo_microservices._models.BaseModel

best_epoch: Optional[int]#

None

The epoch completed of the best checkpoint for the customized model

created_at: datetime.datetime#

None

The time when training started.

elapsed_time: Optional[float]#

None

Time in seconds that the job has been running/took to run.

epochs_completed: Optional[int]#

None

The total number of epochs completed during training

metrics: Optional[nemo_microservices.types.customization.customization_metrics.CustomizationMetrics]#

None

percentage_done: Optional[float]#

None

Percentage tracking the training progress of the customization.

The progress is calculated as the percentage of completed epochs divided by the total number of epochs multiplied by 100. The training progress may not be 100 after training completes due to early stopping (validation loss did not improve significantly over time) or job time limit was reached.

status: nemo_microservices.types.shared.job_status.JobStatus#

None

Normalized statuses for all jobs.

  • CREATED: The job is created, but not yet scheduled.

  • PENDING: The job is waiting for resource allocation.

  • RUNNING: The job is currently running.

  • CANCELLING: The job is being cancelled at the user’s request.

  • CANCELLED: The job has been cancelled by the user.

  • CANCELLING: The job is being cancelled at the user’s request.

  • FAILED: The job failed to execute and terminated.

  • COMPLETED: The job has completed successfully.

  • READY: The job is ready to be used.

  • UNKNOWN: The job status is unknown.

status_logs: Optional[List[nemo_microservices.types.customization.status_log.StatusLog]]#

None

Detailed log for changes to the status of the customization job.

steps_completed: Optional[int]#

None

The number of steps completed during training.

The total number of steps is determined by hyperparameters, number of epochs and the batch size, and the number of samples in the training dataset. total_steps = epochs * ceil(training_samples / batch_size) when both training and validation datasets are used, or total_steps = epochs * ceil(ceil(training_samples * 0.9) / batch_size) when only training dataset is used.

steps_per_epoch: Optional[int]#

None

The number of steps per epoch.

Calculated as follows: steps_per_epoch = ceil(training_samples / batch_size) / epochs. If null, then Customizer simply doesn’t know the value.

train_loss: Optional[float]#

None

The training loss of the best checkpoint for the customized model

updated_at: datetime.datetime#

None

The last time the status was updated.

val_loss: Optional[float]#

None

The validation loss of the best checkpoint for the customized model