nemo_automodel.components.utils.flops_utils
#
Module Contents#
Functions#
Calculate Model FLOPs Utilization (MFU). |
|
Model FLOPs for GPT3 family - accepts either AutoConfig or normalized config |
|
Model FLOPs for llama2 family - accepts either AutoConfig or normalized config |
|
Model FLOPs for llama3 family - accepts either AutoConfig or normalized config |
|
Model FLOPs for nemotron family - accepts either AutoConfig or normalized config |
|
Model FLOPs for mixtral family - accepts either AutoConfig or normalized config |
|
Model FLOPs for Qwen3 family - accepts either AutoConfig or normalized config |
|
Model FLOPs for BERT family - accepts either AutoConfig or normalized config |
|
Calculate FLOPs for a standard Transformer model - accepts either AutoConfig or normalized config. Note: This does not cover encoder-decoder models. |
|
Model FLOPs for CLIP ViT |
|
Model FLOPs for NeVA Projection |
|
Model FLOPs for FLUX |
|
Model FLOPs for DeepSeek V3 - accepts either AutoConfig or normalized config |
|
Model FLOPs for MLP layer. Assume gated linear unit. |
|
Model FLOPs for attention layer |
|
Model FLOPs for Mamba layer. We ignore part of the flops of scan because the chunk size is not known from model config. |
|
Model FLOPs for hybrid model |
|
Model FLOPs for NemotronH |
|
Calculate the flops for the attention part. |
|
Calculate the flops for the MLP |
|
Calculate the flops for the loss |
|
Calculate the flops for the GPT-OSS model |
|
Model FLOPs for GPT-OSS |
|
Get the appropriate FLOPs formula function for a given HuggingFace config. |
API#
- nemo_automodel.components.utils.flops_utils.calculate_mfu(tflops, world_size, time_seconds, reference_mfu=1979.0)[source]#
Calculate Model FLOPs Utilization (MFU).
- Parameters:
tflops – TFLOPs per GPU
world_size – Total number of GPUs
time_seconds – Time taken for computation
reference_mfu – Peak TFLOPs of the hardware (default: H100)
- Returns:
MFU as a percentage
- nemo_automodel.components.utils.flops_utils.gpt3_flops(config, gbs=1, seq_len=None)[source]#
Model FLOPs for GPT3 family - accepts either AutoConfig or normalized config
- nemo_automodel.components.utils.flops_utils.llama2_flops(config, gbs=1, seq_len=None)[source]#
Model FLOPs for llama2 family - accepts either AutoConfig or normalized config
- nemo_automodel.components.utils.flops_utils.llama3_flops(config, gbs=1, seq_len=None)[source]#
Model FLOPs for llama3 family - accepts either AutoConfig or normalized config
- nemo_automodel.components.utils.flops_utils.nemotron_flops(config, gbs=1, seq_len=None)[source]#
Model FLOPs for nemotron family - accepts either AutoConfig or normalized config
- nemo_automodel.components.utils.flops_utils.mixtral_flops(config, gbs=1, seq_len=None)[source]#
Model FLOPs for mixtral family - accepts either AutoConfig or normalized config
- nemo_automodel.components.utils.flops_utils.qwen3_flops(config, gbs=1, seq_len=None)[source]#
Model FLOPs for Qwen3 family - accepts either AutoConfig or normalized config
- nemo_automodel.components.utils.flops_utils.bert_flops(config, gbs=1, seq_len=None)[source]#
Model FLOPs for BERT family - accepts either AutoConfig or normalized config
- nemo_automodel.components.utils.flops_utils.transformer_flops(config, gbs=1, seq_len=None)[source]#
Calculate FLOPs for a standard Transformer model - accepts either AutoConfig or normalized config. Note: This does not cover encoder-decoder models.
- nemo_automodel.components.utils.flops_utils.clip_vit_l_flops(config)[source]#
Model FLOPs for CLIP ViT
- nemo_automodel.components.utils.flops_utils.neva_projection_flops(config)[source]#
Model FLOPs for NeVA Projection
- nemo_automodel.components.utils.flops_utils.deepseekv3_flops(config, gbs=1, seq_len=None)[source]#
Model FLOPs for DeepSeek V3 - accepts either AutoConfig or normalized config
- nemo_automodel.components.utils.flops_utils._nemotronh_mlp_layer_flops(config, gbs, seq_len)[source]#
Model FLOPs for MLP layer. Assume gated linear unit.
- nemo_automodel.components.utils.flops_utils._non_mla_attn_layer_flops(config, gbs, seq_len)[source]#
Model FLOPs for attention layer
- nemo_automodel.components.utils.flops_utils._mamba_layer_flops(config, gbs, seq_len)[source]#
Model FLOPs for Mamba layer. We ignore part of the flops of scan because the chunk size is not known from model config.
- nemo_automodel.components.utils.flops_utils._hybrid_model_flops(config, gbs, seq_len)[source]#
Model FLOPs for hybrid model
- nemo_automodel.components.utils.flops_utils.nemotronh_flops(config, gbs=1, seq_len=None)[source]#
Model FLOPs for NemotronH
- nemo_automodel.components.utils.flops_utils.attention_flops_calculator(
- seqlen,
- hidden_size,
- num_attention_heads,
- num_query_groups,
- kv_channels: Optional[int] = None,
- is_swa: bool = False,
- swa_window_size: int = 128,
Calculate the flops for the attention part.
- nemo_automodel.components.utils.flops_utils.moe_mlp_flops_calculator(
- seqlen,
- hidden_size,
- moe_ffn_hidden_size,
- moe_router_topk,
- gated_linear_unit: bool = True,
Calculate the flops for the MLP
- nemo_automodel.components.utils.flops_utils.loss_flops_calculator(seqlen, hidden_size, vocab_size)[source]#
Calculate the flops for the loss
- nemo_automodel.components.utils.flops_utils.gpt_oss_flops_calculator(
- gbs,
- num_layers,
- seqlen,
- hidden_size,
- num_attention_heads,
- num_query_groups,
- moe_ffn_hidden_size,
- moe_router_topk,
- vocab_size,
- kv_channels: Optional[int] = None,
- swa_window_size: int = 128,
- window_attn_skip_freq: Optional[int] = 2,
Calculate the flops for the GPT-OSS model
- nemo_automodel.components.utils.flops_utils.gpt_oss_flops(config, gbs=1, seq_len=None)[source]#
Model FLOPs for GPT-OSS
- nemo_automodel.components.utils.flops_utils.get_flops_formula_for_hf_config(
- config: Any,
Get the appropriate FLOPs formula function for a given HuggingFace config.
- Parameters:
config – HuggingFace model config object
- Returns:
The appropriate FLOPs formula function, or None if model type is not supported