nemo_deploy.llm.inference.tron_utils
#
Module Contents#
Classes#
Configuration settings for random number generation. |
|
Configuration settings for distributed training initialization. |
Functions#
Get the rank from torch.distributed or environment variable. |
|
Get the world size from torch.distributed or environment variable. |
|
Get the local rank from the environment variable, intended for use before full init. |
|
Print a message only on global rank 0. |
|
Initialize torch.distributed using a TCP init method and env-provided ranks. |
|
Initialize core model parallel. |
|
Set random seed for reproducability. |
|
Initialize communicators with user buffers for high-performance tensor-model-parallel communication overlap. |
|
Determine the model type from the model configuration. |
|
Get a model from the given configuration. |
Data#
API#
- nemo_deploy.llm.inference.tron_utils.LOGGER = 'getLogger(...)'#
- class nemo_deploy.llm.inference.tron_utils.RNGConfig#
Configuration settings for random number generation.
- seed: int = 1234#
Random seed used for python, numpy, pytorch, and cuda.
- te_rng_tracker: bool = False#
Use the Transformer Engine version of the random number generator. Required for CUDA graphs support.
- inference_rng_tracker: bool = False#
Use a random number generator configured for inference.
- data_parallel_random_init: bool = False#
Enable random initialization of params across data parallel ranks
- class nemo_deploy.llm.inference.tron_utils.DistributedInitConfig#
Configuration settings for distributed training initialization.
- distributed_backend: Literal[nccl, gloo] = 'nccl'#
Which backend to use for distributed training.
- distributed_timeout_minutes: int = 10#
Timeout minutes for torch.distributed.
- align_grad_reduce: bool = True#
If not set, all PP stages will launch gradient reduces simultaneously. Otherwise, each PP stage will independently launch as needed.
- local_rank: int = 'field(...)'#
local rank passed from distributed launcher.
- lazy_mpu_init: bool = False#
If set to True, initialize_megatron() skips DDP initialization and returns function to complete it instead. Also turns on –use-cpu-initialization flag. This is for external DDP manager.
- use_torch_fsdp2: bool = False#
Use the torch FSDP2 implementation. FSDP2 is not currently working with Pipeline Parallel. It is still not in a stable release stage, and may therefore contain bugs or other potential issues.
- nccl_communicator_config_path: Optional[str] = None#
Path to the yaml file with NCCL communicator configurations. The number of min/max thread groups and thread group cluster size of each communicator can be configured by setting
min_ctas
,max_ctas
, andcga_cluster_size
.
- use_tp_pp_dp_mapping: bool = False#
If set, distributed ranks initialize order is changed from tp-dp-pp to tp-pp-dp. Make sure EP and CP aren’t used with this option enabled.
- use_gloo_process_groups: bool = True#
If set, create Gloo process groups for communications.
- nemo_deploy.llm.inference.tron_utils.get_rank_safe() int #
Get the rank from torch.distributed or environment variable.
- Returns:
The global rank of the current process.
- Return type:
int
- nemo_deploy.llm.inference.tron_utils.get_world_size_safe() int #
Get the world size from torch.distributed or environment variable.
- Returns:
The total number of processes in the distributed setup.
- Return type:
int
- nemo_deploy.llm.inference.tron_utils.get_local_rank_preinit() int #
Get the local rank from the environment variable, intended for use before full init.
- Returns:
The local rank of the current process.
- Return type:
int
- nemo_deploy.llm.inference.tron_utils.print_rank_0(message: str) None #
Print a message only on global rank 0.
- Parameters:
message (str) – The message string to print.
- nemo_deploy.llm.inference.tron_utils.torch_distributed_init( )#
Initialize torch.distributed using a TCP init method and env-provided ranks.
This function is idempotent: if torch.distributed is already initialized it logs and returns. Otherwise, it sets the current CUDA device based on
LOCAL_RANK
(when GPUs are available), constructs the TCPinit_method
fromMASTER_ADDR
andMASTER_PORT
, and initializes the process group with the backend and timeout specified indist_config
. After init, it issues a barrier scoped to the current device.- Parameters:
dist_config (DistributedInitConfig) – Configuration including backend and timeout used for the process group initialization.
Environment: -
MASTER_ADDR
: Master node address (default: “localhost”). -MASTER_PORT
: Master node port (default: “6000”). -WORLD_SIZE
: Total number of ranks (default: “1”). -RANK
: Global rank of this process (default: “0”). -LOCAL_RANK
: Local rank on the node/GPU index (default: “0”).
- nemo_deploy.llm.inference.tron_utils.initialize_distributed(
- model_config: Union[nemo.collections.llm.gpt.model.base.GPTConfig, nemo.collections.llm.t5.model.t5.T5Config],
- dist_config: nemo_deploy.llm.inference.tron_utils.DistributedInitConfig,
- num_distributed_optimizer_instances: int,
- get_embedding_ranks: Optional[Callable[[List[int], Optional[int]], List[int]]],
- get_position_embedding_ranks: Optional[Callable[[List[int], Optional[int]], List[int]]],
Initialize core model parallel.
- Parameters:
model_config (Union[GPTConfig, T5Config]) – Configuration for the model architecture
dist_config (DistributedInitConfig) – Configuration for distributed initialization
num_distributed_optimizer_instances (int) – Number of optimizer instances for distributed training
get_embedding_ranks (Optional[Callable[[List[int], Optional[int]], List[int]]]) – Function to get the ranks for embedding parallel
get_position_embedding_ranks (Optional[Callable[[List[int], Optional[int]], List[int]]]) – Function to get the ranks for position embedding parallel
- nemo_deploy.llm.inference.tron_utils._set_random_seed(
- seed_: int,
- data_parallel_random_init: bool = False,
- te_rng_tracker: bool = False,
- inference_rng_tracker: bool = False,
Set random seed for reproducability.
- Parameters:
seed_ (int) – Base random seed to use
data_parallel_random_init (bool, optional) – Whether to use different seeds for different data parallel ranks. Defaults to False.
te_rng_tracker (bool, optional) – Whether to use Transformer Engine random number generator. Defaults to False.
inference_rng_tracker (bool, optional) – Whether to use a random number generator configured for inference. Defaults to False.
- nemo_deploy.llm.inference.tron_utils._initialize_tp_communicators(
- model_config: Union[nemo.collections.llm.gpt.model.base.GPTConfig, nemo.collections.llm.t5.model.t5.T5Config],
- micro_batch_size: int,
Initialize communicators with user buffers for high-performance tensor-model-parallel communication overlap.
- Parameters:
model_config (Union[GPTConfig, T5Config]) – Configuration for the model architecture
micro_batch_size (int) – Size of the micro batch
- nemo_deploy.llm.inference.tron_utils._get_model_type(
- model_config: Union[nemo.collections.llm.gpt.model.base.GPTConfig, nemo.collections.llm.t5.model.t5.T5Config],
Determine the model type from the model configuration.
- Parameters:
model_config (Union[GPTConfig, T5Config]) – The model configuration object
- Returns:
The model type enum value (encoder_and_decoder or encoder_or_decoder)
- Return type:
ModelType
- nemo_deploy.llm.inference.tron_utils.get_model_from_config(
- model_config: Union[nemo.collections.llm.gpt.model.base.GPTConfig, nemo.collections.llm.t5.model.t5.T5Config],
- ddp_config: megatron.core.distributed.DistributedDataParallelConfig,
- overlap_param_gather_with_optimizer_step: bool = False,
- wrap_with_ddp: bool = True,
- data_parallel_random_init: bool = True,
- tokenizer=None,
Get a model from the given configuration.
This method should only be called after
init_distributed()
.- Parameters:
model_config (Union[GPTConfig, T5Config]) – The model configuration
ddp_config (DistributedDataParallelConfig) – The distributed data parallel configuration
overlap_param_gather_with_optimizer_step (bool, optional) – Whether to overlap parameter gathering with optimizer step. Defaults to False.
wrap_with_ddp (bool, optional) – Whether to wrap the model with DistributedDataParallel. Defaults to True.
data_parallel_random_init (bool, optional) – Whether to initialize data parallel ranks with random seeds. Defaults to True.
tokenizer (optional) – The tokenizer to pass to configure_model. Defaults to None.
- Returns:
List of model modules, potentially wrapped with DistributedDataParallel
- Return type:
List[MegatronModule]