nemo_automodel._transformers.utils#
Module Contents#
Functions#
Decide whether to load the checkpoint before FSDP/TP/EP sharding. |
|
Returns configuration overrides to handle sliding window settings based on model rules. |
|
Fix Qwen3OmniMoeTalkerCodePredictorConfig accessing use_sliding_window. |
|
Re-export bytes_to_unicode on transformers.models.gpt2.tokenization_gpt2. |
|
Default |
|
Apply compatibility patches for transformers cache utilities. |
API#
- nemo_automodel._transformers.utils._should_load_before_shard(
- *,
- autopipeline: Optional[object],
- tp_size: int,
- ep_size: int,
- pretrained_model_name_or_path: str,
- load_base_model: bool,
- peft_config: Optional[object],
Decide whether to load the checkpoint before FSDP/TP/EP sharding.
Load-before-shard is only safe when running single-GPU (no PP, TP, or EP), a checkpoint actually needs loading, and no PEFT adapter is involved. With any model parallelism the post-shard load path must be used to avoid NCCL collective mismatches or key/device inconsistencies.
- nemo_automodel._transformers.utils.sliding_window_overwrite(model_name: str) dict[str, Any]#
Returns configuration overrides to handle sliding window settings based on model rules.
- Parameters:
model_name – The HuggingFace model name or path to load configuration from
- Returns:
Dictionary with overwrite values, or empty dict if no overwrites needed
- Return type:
dict
- nemo_automodel._transformers.utils.apply_qwen3_omni_config_patch()#
Fix Qwen3OmniMoeTalkerCodePredictorConfig accessing use_sliding_window.
- nemo_automodel._transformers.utils._patch_bytes_to_unicode()#
Re-export bytes_to_unicode on transformers.models.gpt2.tokenization_gpt2.
In transformers v5 this helper was removed from the GPT-2 tokenizer module, but some custom tokenizers shipped with model weights (e.g. Kimi) still import it from there via
trust_remote_code. Monkey-patching it back avoids an ImportError without modifying the transformers package.
- nemo_automodel._transformers.utils._patch_special_tokens_pattern()#
Default
special_tokens_patternto"none"for PreTrainedTokenizer.Transformers v5 introduced
special_tokens_pattern(default"cls_sep") which makesbuild_inputs_with_special_tokensprependcls_token_idand appendsep_token_id. Custom tokenizers (e.g. TikToken-based Kimi) that lack CLS/SEP tokens end up withNoneIDs in the sequence, crashingpad().
- nemo_automodel._transformers.utils.apply_cache_compatibility_patches()#
Apply compatibility patches for transformers cache utilities.