nemo_automodel.components.utils.model_utils#

Module Contents#

Functions#

_supports_logits_to_keep

Check if the model supports logits_to_keep.

_supports_seq_lens

Check if the model supports seq_lens.

_get_model_param_stats

Get the number of trainable parameters and the L2 norm of the model.

resolve_trust_remote_code

Whitelist NVIDIA models to allow remote code execution.

print_trainable_parameters

Print the number of trainable parameters in the model.

_freeze_module_by_attribute_and_patterns

Helper function to freeze parameters by attribute name and name patterns.

apply_parameter_freezing

Apply parameter freezing based on configuration.

squeeze_input_for_thd

Squeeze batch dimension and prepare inputs for THD (total, hidden, depth) format.

init_empty_weights

A context manager under which models are initialized with all parameters on the specified device.

Data#

API#

nemo_automodel.components.utils.model_utils.logger#

‘getLogger(…)’

nemo_automodel.components.utils.model_utils._supports_logits_to_keep(model: torch.nn.Module) bool#

Check if the model supports logits_to_keep.

Parameters:

model (nn.Module) – The model to check.

Returns:

True if the model supports logits_to_keep, False otherwise.

Return type:

bool

nemo_automodel.components.utils.model_utils._supports_seq_lens(model: torch.nn.Module) bool#

Check if the model supports seq_lens.

nemo_automodel.components.utils.model_utils._get_model_param_stats(
model: torch.nn.Module,
) tuple[int, int, float]#

Get the number of trainable parameters and the L2 norm of the model.

Parameters:

model – Model to analyze

Returns:

int trainable_params: int local_sq_norm: float

Return type:

total_params

nemo_automodel.components.utils.model_utils.resolve_trust_remote_code(pretrained_model_name_or_path)#

Whitelist NVIDIA models to allow remote code execution.

Parameters:

pretrained_model_name_or_path (str) – The name or path of the pretrained model.

Returns:

True if the model should be loaded with trust_remote_code, False otherwise.

Return type:

bool

nemo_automodel.components.utils.model_utils.print_trainable_parameters(model: torch.nn.Module) tuple[int, int]#

Print the number of trainable parameters in the model.

Parameters:

model – Model to analyze

Returns:

int total_params: int

Return type:

trainable_params

nemo_automodel.components.utils.model_utils._freeze_module_by_attribute_and_patterns(
model,
attribute_name,
name_patterns,
)#

Helper function to freeze parameters by attribute name and name patterns.

Parameters:
  • model – The model to apply freezing to.

  • attribute_name – Name of the model attribute to freeze (e.g., ‘vision_tower’).

  • name_patterns – List of patterns to match in module names.

nemo_automodel.components.utils.model_utils.apply_parameter_freezing(model, freeze_config)#

Apply parameter freezing based on configuration.

Parameters:
  • model – The model to apply freezing to.

  • freeze_config – Configuration dict specifying what to freeze.

freeze_config can contain: - freeze_embeddings: bool (default True) - freeze_vision_tower: bool (default False) - freeze_language_model: bool (default False)

nemo_automodel.components.utils.model_utils.squeeze_input_for_thd(
input_ids,
position_ids,
padding_mask,
attn_kwargs,
seqlens_padding_value=-1000,
)#

Squeeze batch dimension and prepare inputs for THD (total, hidden, depth) format.

This function removes the batch dimension from input tensors and processes attention kwargs for use with Transformer Engine’s THD format. It’s typically used when the batch has already been converted to THD format (with batch_size=1 as a placeholder dimension) and that dimension needs to be removed.

The function performs three key operations:

  1. Removes the batch dimension (dim 0) from input tensors

  2. Filters out padding values from cumulative sequence length tensors

  3. Converts max_seqlen from tensor to scalar if needed

Parameters:
  • input_ids (torch.Tensor) – Input token IDs with shape [1, total_tokens] or [1, total_tokens, hidden_dim]. The first dimension will be squeezed.

  • position_ids (torch.Tensor) – Position IDs with shape [1, total_tokens]. The first dimension will be squeezed.

  • padding_mask (torch.Tensor) – Padding mask with shape [1, total_tokens]. The first dimension will be squeezed.

  • attn_kwargs (dict) –

    Dictionary of attention-related tensors. May contain:

    • cu_seqlens: Cumulative sequence lengths [1, num_seqs+1]

    • cu_seqlens_padded: Cumulative padded sequence lengths [1, num_seqs+1]

    • max_seqlen: Maximum sequence length (tensor or int)

    • Other attention parameters (will be squeezed if tensors)

  • seqlens_padding_value (int) – Sentinel value used to indicate padding in cu_seqlens and cu_seqlens_padded tensors. These values will be filtered out. Default: -1000.

Returns:

A tuple containing: - input_ids (torch.Tensor): Input IDs with batch dimension removed [total_tokens] or [total_tokens, hidden_dim] - position_ids (torch.Tensor): Position IDs with batch dimension removed [total_tokens] - padding_mask (torch.Tensor): Padding mask with batch dimension removed [total_tokens] - attn_kwargs (dict): Updated attention kwargs with: - Batch dimensions removed from all tensor values - Padding values filtered from cu_seqlens and cu_seqlens_padded - max_seqlen converted to scalar if it was a tensor

Return type:

tuple

.. rubric:: Example

input_ids = torch.tensor([[1, 2, 3, 4, 5]]) # [1, 5] position_ids = torch.tensor([[0, 1, 2, 3, 4]]) # [1, 5] padding_mask = torch.tensor([[False, False, False, False, False]]) # [1, 5] attn_kwargs = { … ‘cu_seqlens’: torch.tensor([[0, 3, 5, -1000]]), # [1, 4] with padding … ‘cu_seqlens_padded’: torch.tensor([[0, 3, 5, -1000]]), … ‘max_seqlen’: torch.tensor([3]) … } ids, pos, mask, kwargs = squeeze_input_for_thd( … input_ids, position_ids, padding_mask, attn_kwargs … ) ids.shape torch.Size([5]) kwargs[‘cu_seqlens’] # Padding value filtered out tensor([0, 3, 5]) kwargs[‘max_seqlen’] # Converted to scalar 3

.. note::

This function modifies attn_kwargs in-place. If you need to preserve the original dictionary, pass a copy.

nemo_automodel.components.utils.model_utils.init_empty_weights()#

A context manager under which models are initialized with all parameters on the specified device.

Parameters:

device (torch.device) – Device to initialize all parameters on.

Example:

import torch.nn as nn
from nemo_automodel.components.utils.model_utils import init_empty_weights

with init_empty_weights():
    tst = nn.Linear(100, 100)  # on `cuda` device