> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.checkpoint.utils

## Module Contents

### Functions

| Name                                                                                                                       | Description                                                                      |
| -------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
| [`_get_checkpoint_tensor_dtypes`](#nemo_automodel-components-checkpoint-utils-_get_checkpoint_tensor_dtypes)               | Inspect checkpoint tensors and return their exact dtypes by key.                 |
| [`_get_module_by_normalized_name`](#nemo_automodel-components-checkpoint-utils-_get_module_by_normalized_name)             | Return a module by FQN after applying wrapper-prefix normalization.              |
| [`_normalize_param_name`](#nemo_automodel-components-checkpoint-utils-_normalize_param_name)                               | Strip wrapper-specific prefixes from a parameter name.                           |
| [`_same_tensor_storage`](#nemo_automodel-components-checkpoint-utils-_same_tensor_storage)                                 | Return whether two tensors are aliases of the same local storage.                |
| [`ensure_tied_lm_head`](#nemo_automodel-components-checkpoint-utils-ensure_tied_lm_head)                                   | Ensure a local tied LM head actually aliases the input embedding.                |
| [`estimate_state_dict_bytes`](#nemo_automodel-components-checkpoint-utils-estimate_state_dict_bytes)                       | Estimate logical bytes in a state dict without materializing tensors.            |
| [`estimate_tensor_bytes`](#nemo_automodel-components-checkpoint-utils-estimate_tensor_bytes)                               | Estimate logical bytes in a tensor without materializing it.                     |
| [`format_bytes`](#nemo_automodel-components-checkpoint-utils-format_bytes)                                                 | Format bytes as a human-readable GiB value.                                      |
| [`format_output_file_count`](#nemo_automodel-components-checkpoint-utils-format_output_file_count)                         | Format the output shard count for user-facing log messages.                      |
| [`get_input_embeddings_weight_and_name`](#nemo_automodel-components-checkpoint-utils-get_input_embeddings_weight_and_name) | Return the input embedding weight and normalized name if present.                |
| [`get_lm_head_weight_and_name`](#nemo_automodel-components-checkpoint-utils-get_lm_head_weight_and_name)                   | Return the first `lm_head.weight` parameter found on a model.                    |
| [`get_rank_safe`](#nemo_automodel-components-checkpoint-utils-get_rank_safe)                                               | Return the current distributed rank, defaulting to 0 when not initialized.       |
| [`get_safetensors_index_total_size`](#nemo_automodel-components-checkpoint-utils-get_safetensors_index_total_size)         | Return the total checkpoint size recorded in a Hugging Face safetensors index.   |
| [`get_tied_lm_head_source_names`](#nemo_automodel-components-checkpoint-utils-get_tied_lm_head_source_names)               | Return candidate checkpoint keys that can source a tied LM head.                 |
| [`get_world_size_safe`](#nemo_automodel-components-checkpoint-utils-get_world_size_safe)                                   | Return the current distributed world size, defaulting to 1 when not initialized. |
| [`has_local_tied_lm_head`](#nemo_automodel-components-checkpoint-utils-has_local_tied_lm_head)                             | Return whether the current model partition has an actual tied LM head.           |
| [`is_rank_0`](#nemo_automodel-components-checkpoint-utils-is_rank_0)                                                       | Return True on the main rank.                                                    |
| [`is_tied_word_embeddings`](#nemo_automodel-components-checkpoint-utils-is_tied_word_embeddings)                           | Check if the model's word embeddings are tied.                                   |
| [`materialize_missing_tied_lm_head`](#nemo_automodel-components-checkpoint-utils-materialize_missing_tied_lm_head)         | Populate a missing tied `lm_head.weight` from its embedding source.              |
| [`resolve_trust_remote_code`](#nemo_automodel-components-checkpoint-utils-resolve_trust_remote_code)                       | Whitelist NVIDIA models to allow remote code execution.                          |

### API

```python
nemo_automodel.components.checkpoint.utils._get_checkpoint_tensor_dtypes(
    pretrained_model_name_or_path: str,
    hf_config: typing.Any,
    load_kwargs: collections.abc.Mapping[str, object] | None = None
) -> dict[str, torch.dtype]
```

Inspect checkpoint tensors and return their exact dtypes by key.

This reads checkpoint metadata only by loading tensors on the `meta`
device, so it preserves the per-tensor dtype information without
materializing full checkpoint weights in memory.

```python
nemo_automodel.components.checkpoint.utils._get_module_by_normalized_name(
    model: torch.nn.Module,
    normalized_module_name: str
) -> torch.nn.Module | None
```

Return a module by FQN after applying wrapper-prefix normalization.

```python
nemo_automodel.components.checkpoint.utils._normalize_param_name(
    name: str
) -> str
```

Strip wrapper-specific prefixes from a parameter name.

```python
nemo_automodel.components.checkpoint.utils._same_tensor_storage(
    left: torch.Tensor,
    right: torch.Tensor
) -> bool
```

Return whether two tensors are aliases of the same local storage.

```python
nemo_automodel.components.checkpoint.utils.ensure_tied_lm_head(
    model: torch.nn.Module
) -> bool
```

Ensure a local tied LM head actually aliases the input embedding.

Hugging Face `tie_weights()` is the first choice because model classes can
have custom tying rules. The direct assignment fallback handles wrapped
models whose generic `tie_weights()` no longer reaches the local
`lm_head`/embedding pair after sharding.

**Parameters:**

Model or pipeline stage to inspect and update.

**Returns:** `bool`

`True` if the local `lm_head` and input embedding are tied after the

```python
nemo_automodel.components.checkpoint.utils.estimate_state_dict_bytes(
    state_dict: dict[str, torch.Tensor]
) -> int | None
```

Estimate logical bytes in a state dict without materializing tensors.

```python
nemo_automodel.components.checkpoint.utils.estimate_tensor_bytes(
    tensor: torch.Tensor
) -> int
```

Estimate logical bytes in a tensor without materializing it.

```python
nemo_automodel.components.checkpoint.utils.format_bytes(
    num_bytes: int
) -> str
```

Format bytes as a human-readable GiB value.

```python
nemo_automodel.components.checkpoint.utils.format_output_file_count(
    count: int
) -> str
```

Format the output shard count for user-facing log messages.

```python
nemo_automodel.components.checkpoint.utils.get_input_embeddings_weight_and_name(
    model: torch.nn.Module
) -> tuple[torch.Tensor | None, str | None]
```

Return the input embedding weight and normalized name if present.

**Parameters:**

Model to inspect.

**Returns:** `torch.Tensor | None`

Tuple of the embedding weight tensor and its normalized FQN, or

```python
nemo_automodel.components.checkpoint.utils.get_lm_head_weight_and_name(
    model: torch.nn.Module
) -> tuple[torch.Tensor | None, str | None]
```

Return the first `lm_head.weight` parameter found on a model.

**Parameters:**

Model to inspect.

**Returns:** `torch.Tensor | None`

Tuple of the parameter tensor and its normalized FQN, or `(None, None)`

```python
nemo_automodel.components.checkpoint.utils.get_rank_safe() -> int
```

Return the current distributed rank, defaulting to 0 when not initialized.

```python
nemo_automodel.components.checkpoint.utils.get_safetensors_index_total_size(
    index_path: str | None
) -> int | None
```

Return the total checkpoint size recorded in a Hugging Face safetensors index.

```python
nemo_automodel.components.checkpoint.utils.get_tied_lm_head_source_names(
    model: torch.nn.Module,
    lm_head_param_name: str | None = None
) -> list[str]
```

Return candidate checkpoint keys that can source a tied LM head.

**Parameters:**

Model or pipeline stage to inspect.

Optional normalized LM head FQN.

**Returns:** `list[str]`

Ordered list of possible source FQNs.

```python
nemo_automodel.components.checkpoint.utils.get_world_size_safe() -> int
```

Return the current distributed world size, defaulting to 1 when not initialized.

```python
nemo_automodel.components.checkpoint.utils.has_local_tied_lm_head(
    model: torch.nn.Module
) -> bool
```

Return whether the current model partition has an actual tied LM head.

This is stricter than `is_tied_word_embeddings()`: pipeline stages often
keep the config flag set to `True` even when `lm_head` and
`embed_tokens` live on different partitions. Some custom models can also
declare tied embeddings in config without actually aliasing the parameters.
In that case omitting `lm_head.weight` from a checkpoint loses trained
state, so only treat it as safely tied when the local tensors share storage.

**Parameters:**

Model or pipeline stage to inspect.

**Returns:** `bool`

`True` when the model is configured with tied word embeddings, both

```python
nemo_automodel.components.checkpoint.utils.is_rank_0() -> bool
```

Return True on the main rank.

```python
nemo_automodel.components.checkpoint.utils.is_tied_word_embeddings(
    model: torch.nn.Module
) -> bool
```

Check if the model's word embeddings are tied.

**Parameters:**

The model to check.

**Returns:** `bool`

True if the model's word embeddings are tied, False otherwise.

```python
nemo_automodel.components.checkpoint.utils.materialize_missing_tied_lm_head(
    state_dict: dict[str, typing.Any],
    model: torch.nn.Module,
    allow_current_lm_head_fallback: bool = False
) -> bool
```

Populate a missing tied `lm_head.weight` from its embedding source.

Hugging Face checkpoints for tied-embedding models often omit
`lm_head.weight` entirely. That is fine for unsplit models where
`tie_weights()` can restore the alias, but it breaks pipeline-parallel last
stages which own `lm_head` but not `embed_tokens`.

**Parameters:**

Checkpoint state dict to mutate in place.

Target model or pipeline stage.

If `True`, fall back to the current
`lm_head` tensor when the tied source cannot be found in
`state_dict`. This preserves legacy resume behavior for older
checkpoints that were saved without a local `lm_head.weight`.

**Returns:** `bool`

`True` if a missing `lm_head.weight` was materialized, else `False`.

```python
nemo_automodel.components.checkpoint.utils.resolve_trust_remote_code(
    pretrained_model_name_or_path
)
```

Whitelist NVIDIA models to allow remote code execution.

**Parameters:**

The name or path of the pretrained model.

**Returns:**

True if the model should be loaded with trust\_remote\_code, False otherwise.