core._rank_utils#

Low-level rank utilities with minimal dependencies to avoid circular imports.

Module Contents#

Functions#

safe_get_rank

Get the distributed rank safely, even if torch.distributed is not initialized.

safe_get_world_size

Get the distributed world size safely, even if torch.distributed is not initialized.

log_single_rank

Log a message only on a single rank.

API#

core._rank_utils.safe_get_rank() int#

Get the distributed rank safely, even if torch.distributed is not initialized.

Fallback order:

  1. torch.distributed.get_rank() (if initialized)

  2. RANK environment variable (torchrun/torchelastic)

  3. SLURM_PROCID environment variable (SLURM)

  4. Default: 0 (with warning)

Returns:

The rank of the current process.

Return type:

int

core._rank_utils.safe_get_world_size() int#

Get the distributed world size safely, even if torch.distributed is not initialized.

Fallback order:

  1. torch.distributed.get_world_size() (if initialized)

  2. WORLD_SIZE environment variable (torchrun/torchelastic)

  3. SLURM_NTASKS environment variable (SLURM)

  4. Default: 1 (with warning)

Returns:

The total number of processes in the distributed job.

core._rank_utils.log_single_rank(
logger: logging.Logger,
*args: Any,
rank: int = 0,
**kwargs: Any,
) None#

Log a message only on a single rank.

If torch distributed is initialized, write log on only one rank.

Parameters:
  • logger – The logger to write the logs.

  • *args – All logging.Logger.log positional arguments.

  • rank – The rank to write on. Defaults to 0.

  • **kwargs – All logging.Logger.log keyword arguments.