bridge.models.hf_pretrained.safe_config_loader
#
Thread-safe configuration loading utilities.
This module provides utilities for safely loading HuggingFace model configurations in multi-threaded environments, preventing race conditions that can occur when multiple threads try to download and cache the same model simultaneously.
Module Contents#
Functions#
Thread-safe and process-safe configuration loading with retry logic. |
API#
- bridge.models.hf_pretrained.safe_config_loader.safe_load_config_with_retry(
- path: Union[str, pathlib.Path],
- trust_remote_code: bool = False,
- max_retries: int = 3,
- base_delay: float = 1.0,
- **kwargs,
Thread-safe and process-safe configuration loading with retry logic.
This function prevents race conditions when multiple threads/processes try to download and cache the same model configuration simultaneously. Uses file locking (if filelock is available) to coordinate access across processes.
- Parameters:
path – HuggingFace model ID or path to model directory
trust_remote_code – Whether to trust remote code when loading config
max_retries – Maximum number of retry attempts (default: 3)
base_delay – Base delay in seconds for exponential backoff (default: 1.0)
**kwargs – Additional arguments passed to AutoConfig.from_pretrained
- Returns:
The loaded model configuration
- Return type:
PretrainedConfig
- Raises:
ValueError – If config loading fails after all retries
Environment Variables: MEGATRON_CONFIG_LOCK_DIR: Override the directory where lock files are created. Default: ~/.cache/huggingface/ Useful for multi-node setups where a shared lock directory is needed.
.. rubric:: Example
config = safe_load_config_with_retry(“meta-llama/Meta-Llama-3-8B”) print(config.model_type)
With custom retry settings
config = safe_load_config_with_retry( … “gpt2”, … max_retries=5, … base_delay=0.5, … trust_remote_code=True … )
import os os.environ[“MEGATRON_CONFIG_LOCK_DIR”] = “/shared/locks” config = safe_load_config_with_retry(“meta-llama/Meta-Llama-3-8B”)