`nemo_automodel._transformers.utils`#

Module Contents#

Functions#

`_should_load_before_shard`	Decide whether to load the checkpoint before FSDP/TP/EP sharding.
`sliding_window_overwrite`	Returns configuration overrides to handle sliding window settings based on model rules.
`apply_qwen3_omni_config_patch`	Fix Qwen3OmniMoeTalkerCodePredictorConfig accessing use_sliding_window.
`_patch_bytes_to_unicode`	Re-export bytes_to_unicode on transformers.models.gpt2.tokenization_gpt2.
`_patch_special_tokens_pattern`	Default `special_tokens_pattern` to `"none"` for PreTrainedTokenizer.
`apply_cache_compatibility_patches`	Apply compatibility patches for transformers cache utilities.
`_patch_phi4mm_processor`	Patch AutoProcessor.from_pretrained to fall back to the remote Phi4MMProcessor when the native Phi4MultimodalProcessor fails (hub processor_config.json points to native class but the tokenizer lacks image_token/audio_token attributes the native processor expects).
`_patch_peft_prepare_inputs`	Patch PeftModelForCausalLM.init to handle models whose inner backbone lacks prepare_inputs_for_generation (e.g. Phi4MM applies PEFT to the inner Phi4MMModel, not the outer ForCausalLM).

Data#

logger

API#

nemo_automodel._transformers.utils.logger#: ‘getLogger(…)’

nemo_automodel._transformers.utils._should_load_before_shard( *, autopipeline: Optional[object], tp_size: int, ep_size: int, dp_shard_size: int = 1, pretrained_model_name_or_path: str, load_base_model: bool, peft_config: Optional[object], ) → bool#

Decide whether to load the checkpoint before FSDP/TP/EP sharding.

Load-before-shard is only safe when running single-GPU (no PP, TP, EP, or DP sharding) and a checkpoint actually needs loading. With any model parallelism the post-shard load path must be used to avoid NCCL collective mismatches or key/device inconsistencies.

PEFT models skip this path and use the post-shard load so that base and adapter weights load in the same way as multi-GPU.

nemo_automodel._transformers.utils.sliding_window_overwrite(model_name: str) → dict[str, Any]#

Returns configuration overrides to handle sliding window settings based on model rules.

Parameters:: model_name – The HuggingFace model name or path to load configuration from
Returns:: Dictionary with overwrite values, or empty dict if no overwrites needed
Return type:: dict

nemo_automodel._transformers.utils.apply_qwen3_omni_config_patch()#: Fix Qwen3OmniMoeTalkerCodePredictorConfig accessing use_sliding_window.

nemo_automodel._transformers.utils._patch_bytes_to_unicode()#

Re-export bytes_to_unicode on transformers.models.gpt2.tokenization_gpt2.

In transformers v5 this helper was removed from the GPT-2 tokenizer module, but some custom tokenizers shipped with model weights (e.g. Kimi) still import it from there via trust_remote_code. Monkey-patching it back avoids an ImportError without modifying the transformers package.

nemo_automodel._transformers.utils._patch_special_tokens_pattern()#

Default special_tokens_pattern to "none" for PreTrainedTokenizer.

Transformers v5 introduced special_tokens_pattern (default "cls_sep") which makes build_inputs_with_special_tokens prepend cls_token_id and append sep_token_id. Custom tokenizers (e.g. TikToken-based Kimi) that lack CLS/SEP tokens end up with None IDs in the sequence, crashing pad().

nemo_automodel._transformers.utils.apply_cache_compatibility_patches()#

Apply compatibility patches for transformers cache utilities.

Patches applied here fix API removals/changes between transformers versions so that both native and remote-code models can load and run.

nemo_automodel._transformers.utils._patch_phi4mm_processor()#: Patch AutoProcessor.from_pretrained to fall back to the remote Phi4MMProcessor when the native Phi4MultimodalProcessor fails (hub processor_config.json points to native class but the tokenizer lacks image_token/audio_token attributes the native processor expects).

nemo_automodel._transformers.utils._patch_peft_prepare_inputs()#: Patch PeftModelForCausalLM.init to handle models whose inner backbone lacks prepare_inputs_for_generation (e.g. Phi4MM applies PEFT to the inner Phi4MMModel, not the outer ForCausalLM).

nemo_automodel._transformers.utils#

Module Contents#

Functions#

Data#

API#

`nemo_automodel._transformers.utils`#