bridge.models.decorators.torchrun#

Module Contents#

Functions#

torchrun_main

A decorator that wraps the main function of a torchrun script. It uses the torch.distributed.elastic.multiprocessing.errors.record decorator to record any exceptions and ensures that the distributed process group is properly destroyed on successful completion. In case of an exception, it prints the traceback and performs a hard exit, allowing torchrun to terminate all other processes.

API#

bridge.models.decorators.torchrun.torchrun_main(fn)#

A decorator that wraps the main function of a torchrun script. It uses the torch.distributed.elastic.multiprocessing.errors.record decorator to record any exceptions and ensures that the distributed process group is properly destroyed on successful completion. In case of an exception, it prints the traceback and performs a hard exit, allowing torchrun to terminate all other processes.