nemo_automodel._transformers.utils#

Module Contents#

Functions#

_should_load_before_shard

Decide whether to load the checkpoint before FSDP/TP/EP sharding.

sliding_window_overwrite

Returns configuration overrides to handle sliding window settings based on model rules.

apply_qwen3_omni_config_patch

Fix Qwen3OmniMoeTalkerCodePredictorConfig accessing use_sliding_window.

_patch_bytes_to_unicode

Re-export bytes_to_unicode on transformers.models.gpt2.tokenization_gpt2.

_patch_special_tokens_pattern

Default special_tokens_pattern to "none" for PreTrainedTokenizer.

apply_cache_compatibility_patches

Apply compatibility patches for transformers cache utilities.

API#

nemo_automodel._transformers.utils._should_load_before_shard(
*,
autopipeline: Optional[object],
tp_size: int,
ep_size: int,
pretrained_model_name_or_path: str,
load_base_model: bool,
peft_config: Optional[object],
) bool#

Decide whether to load the checkpoint before FSDP/TP/EP sharding.

Load-before-shard is only safe when running single-GPU (no PP, TP, or EP), a checkpoint actually needs loading, and no PEFT adapter is involved. With any model parallelism the post-shard load path must be used to avoid NCCL collective mismatches or key/device inconsistencies.

nemo_automodel._transformers.utils.sliding_window_overwrite(model_name: str) dict[str, Any]#

Returns configuration overrides to handle sliding window settings based on model rules.

Parameters:

model_name – The HuggingFace model name or path to load configuration from

Returns:

Dictionary with overwrite values, or empty dict if no overwrites needed

Return type:

dict

nemo_automodel._transformers.utils.apply_qwen3_omni_config_patch()#

Fix Qwen3OmniMoeTalkerCodePredictorConfig accessing use_sliding_window.

nemo_automodel._transformers.utils._patch_bytes_to_unicode()#

Re-export bytes_to_unicode on transformers.models.gpt2.tokenization_gpt2.

In transformers v5 this helper was removed from the GPT-2 tokenizer module, but some custom tokenizers shipped with model weights (e.g. Kimi) still import it from there via trust_remote_code. Monkey-patching it back avoids an ImportError without modifying the transformers package.

nemo_automodel._transformers.utils._patch_special_tokens_pattern()#

Default special_tokens_pattern to "none" for PreTrainedTokenizer.

Transformers v5 introduced special_tokens_pattern (default "cls_sep") which makes build_inputs_with_special_tokens prepend cls_token_id and append sep_token_id. Custom tokenizers (e.g. TikToken-based Kimi) that lack CLS/SEP tokens end up with None IDs in the sequence, crashing pad().

nemo_automodel._transformers.utils.apply_cache_compatibility_patches()#

Apply compatibility patches for transformers cache utilities.