core._rank_utils#
Low-level rank utilities with minimal dependencies to avoid circular imports.
Module Contents#
Functions#
Get the distributed rank safely, even if torch.distributed is not initialized. |
|
Get the distributed world size safely, even if torch.distributed is not initialized. |
|
Log a message only on a single rank. |
API#
- core._rank_utils.safe_get_rank() int#
Get the distributed rank safely, even if torch.distributed is not initialized.
Fallback order:
torch.distributed.get_rank() (if initialized)
RANK environment variable (torchrun/torchelastic)
SLURM_PROCID environment variable (SLURM)
Default: 0 (with warning)
- Returns:
The rank of the current process.
- Return type:
int
- core._rank_utils.safe_get_world_size() int#
Get the distributed world size safely, even if torch.distributed is not initialized.
Fallback order:
torch.distributed.get_world_size() (if initialized)
WORLD_SIZE environment variable (torchrun/torchelastic)
SLURM_NTASKS environment variable (SLURM)
Default: 1 (with warning)
- Returns:
The total number of processes in the distributed job.
- core._rank_utils.log_single_rank(
- logger: logging.Logger,
- *args: Any,
- rank: int = 0,
- **kwargs: Any,
Log a message only on a single rank.
If torch distributed is initialized, write log on only one rank.
- Parameters:
logger – The logger to write the logs.
*args – All logging.Logger.log positional arguments.
rank – The rank to write on. Defaults to 0.
**kwargs – All logging.Logger.log keyword arguments.