nemo_automodel.components.utils.flops_utils#

Module Contents#

Functions#

calculate_mfu

Calculate Model FLOPs Utilization (MFU).

gpt3_flops

Model FLOPs for GPT3 family - accepts either AutoConfig or normalized config

llama2_flops

Model FLOPs for llama2 family - accepts either AutoConfig or normalized config

llama3_flops

Model FLOPs for llama3 family - accepts either AutoConfig or normalized config

nemotron_flops

Model FLOPs for nemotron family - accepts either AutoConfig or normalized config

mixtral_flops

Model FLOPs for mixtral family - accepts either AutoConfig or normalized config

qwen3_flops

Model FLOPs for Qwen3 family - accepts either AutoConfig or normalized config

bert_flops

Model FLOPs for BERT family - accepts either AutoConfig or normalized config

transformer_flops

Calculate FLOPs for a standard Transformer model - accepts either AutoConfig or normalized config. Note: This does not cover encoder-decoder models.

clip_vit_l_flops

Model FLOPs for CLIP ViT

neva_projection_flops

Model FLOPs for NeVA Projection

flux_flops

Model FLOPs for FLUX

deepseekv3_flops

Model FLOPs for DeepSeek V3 - accepts either AutoConfig or normalized config

_nemotronh_mlp_layer_flops

Model FLOPs for MLP layer. Assume gated linear unit.

_non_mla_attn_layer_flops

Model FLOPs for attention layer

_mamba_layer_flops

Model FLOPs for Mamba layer. We ignore part of the flops of scan because the chunk size is not known from model config.

_hybrid_model_flops

Model FLOPs for hybrid model

nemotronh_flops

Model FLOPs for NemotronH

attention_flops_calculator

Calculate the flops for the attention part.

moe_mlp_flops_calculator

Calculate the flops for the MLP

loss_flops_calculator

Calculate the flops for the loss

gpt_oss_flops_calculator

Calculate the flops for the GPT-OSS model

gpt_oss_flops

Model FLOPs for GPT-OSS

get_flops_formula_for_hf_config

Get the appropriate FLOPs formula function for a given HuggingFace config.

API#

nemo_automodel.components.utils.flops_utils.calculate_mfu(tflops, world_size, time_seconds, reference_mfu=1979.0)[source]#

Calculate Model FLOPs Utilization (MFU).

Parameters:
  • tflops – TFLOPs per GPU

  • world_size – Total number of GPUs

  • time_seconds – Time taken for computation

  • reference_mfu – Peak TFLOPs of the hardware (default: H100)

Returns:

MFU as a percentage

nemo_automodel.components.utils.flops_utils.gpt3_flops(config, gbs=1, seq_len=None)[source]#

Model FLOPs for GPT3 family - accepts either AutoConfig or normalized config

nemo_automodel.components.utils.flops_utils.llama2_flops(config, gbs=1, seq_len=None)[source]#

Model FLOPs for llama2 family - accepts either AutoConfig or normalized config

nemo_automodel.components.utils.flops_utils.llama3_flops(config, gbs=1, seq_len=None)[source]#

Model FLOPs for llama3 family - accepts either AutoConfig or normalized config

nemo_automodel.components.utils.flops_utils.nemotron_flops(config, gbs=1, seq_len=None)[source]#

Model FLOPs for nemotron family - accepts either AutoConfig or normalized config

nemo_automodel.components.utils.flops_utils.mixtral_flops(config, gbs=1, seq_len=None)[source]#

Model FLOPs for mixtral family - accepts either AutoConfig or normalized config

nemo_automodel.components.utils.flops_utils.qwen3_flops(config, gbs=1, seq_len=None)[source]#

Model FLOPs for Qwen3 family - accepts either AutoConfig or normalized config

nemo_automodel.components.utils.flops_utils.bert_flops(config, gbs=1, seq_len=None)[source]#

Model FLOPs for BERT family - accepts either AutoConfig or normalized config

nemo_automodel.components.utils.flops_utils.transformer_flops(config, gbs=1, seq_len=None)[source]#

Calculate FLOPs for a standard Transformer model - accepts either AutoConfig or normalized config. Note: This does not cover encoder-decoder models.

nemo_automodel.components.utils.flops_utils.clip_vit_l_flops(config)[source]#

Model FLOPs for CLIP ViT

nemo_automodel.components.utils.flops_utils.neva_projection_flops(config)[source]#

Model FLOPs for NeVA Projection

nemo_automodel.components.utils.flops_utils.flux_flops(config)[source]#

Model FLOPs for FLUX

nemo_automodel.components.utils.flops_utils.deepseekv3_flops(config, gbs=1, seq_len=None)[source]#

Model FLOPs for DeepSeek V3 - accepts either AutoConfig or normalized config

nemo_automodel.components.utils.flops_utils._nemotronh_mlp_layer_flops(config, gbs, seq_len)[source]#

Model FLOPs for MLP layer. Assume gated linear unit.

nemo_automodel.components.utils.flops_utils._non_mla_attn_layer_flops(config, gbs, seq_len)[source]#

Model FLOPs for attention layer

nemo_automodel.components.utils.flops_utils._mamba_layer_flops(config, gbs, seq_len)[source]#

Model FLOPs for Mamba layer. We ignore part of the flops of scan because the chunk size is not known from model config.

nemo_automodel.components.utils.flops_utils._hybrid_model_flops(config, gbs, seq_len)[source]#

Model FLOPs for hybrid model

nemo_automodel.components.utils.flops_utils.nemotronh_flops(config, gbs=1, seq_len=None)[source]#

Model FLOPs for NemotronH

nemo_automodel.components.utils.flops_utils.attention_flops_calculator(
seqlen,
hidden_size,
num_attention_heads,
num_query_groups,
kv_channels: Optional[int] = None,
is_swa: bool = False,
swa_window_size: int = 128,
)[source]#

Calculate the flops for the attention part.

nemo_automodel.components.utils.flops_utils.moe_mlp_flops_calculator(
seqlen,
hidden_size,
moe_ffn_hidden_size,
moe_router_topk,
gated_linear_unit: bool = True,
)[source]#

Calculate the flops for the MLP

nemo_automodel.components.utils.flops_utils.loss_flops_calculator(seqlen, hidden_size, vocab_size)[source]#

Calculate the flops for the loss

nemo_automodel.components.utils.flops_utils.gpt_oss_flops_calculator(
gbs,
num_layers,
seqlen,
hidden_size,
num_attention_heads,
num_query_groups,
moe_ffn_hidden_size,
moe_router_topk,
vocab_size,
kv_channels: Optional[int] = None,
swa_window_size: int = 128,
window_attn_skip_freq: Optional[int] = 2,
)[source]#

Calculate the flops for the GPT-OSS model

nemo_automodel.components.utils.flops_utils.gpt_oss_flops(config, gbs=1, seq_len=None)[source]#

Model FLOPs for GPT-OSS

nemo_automodel.components.utils.flops_utils.get_flops_formula_for_hf_config(
config: Any,
) Optional[Callable][source]#

Get the appropriate FLOPs formula function for a given HuggingFace config.

Parameters:

config – HuggingFace model config object

Returns:

The appropriate FLOPs formula function, or None if model type is not supported