nemo_curator.utils.hf_download_utils

View as Markdown

Module Contents

Functions

NameDescription
download_model_from_hfDownload a model from Hugging Face.

API

nemo_curator.utils.hf_download_utils.download_model_from_hf(
model_id: str,
local_dir: str | pathlib.Path,
ignore_patterns: list[str] | None = None,
filename: str | None = None,
revision: str | None = None
) -> None

Download a model from Hugging Face.

This function downloads either a specific file or the entire model repository from Hugging Face Hub to a local directory.

Parameters:

model_id
str

The Hugging Face model identifier (e.g., ‘gpt2’, ‘bert-base-uncased’)

local_dir
str | Path

Local directory where the model will be downloaded

ignore_patterns
list[str] | NoneDefaults to None

List of glob patterns to ignore when downloading. Only used when filename is not provided. Defaults to None.

filename
str | NoneDefaults to None

Specific file to download from the repository. If provided, only this file will be downloaded and ignore_patterns will be ignored. Defaults to None.

revision
str | NoneDefaults to None

Git revision (branch, tag, or commit hash) to download. Defaults to None (latest main branch).

Raises:

  • ValueError: If both filename and ignore_patterns are provided (not supported).

Examples:

# Download entire model repository
download_model_from_hf('gpt2', './models/gpt2')
# Download specific file
download_model_from_hf('gpt2', './models/gpt2', filename='config.json')
# Download with ignore patterns
download_model_from_hf('gpt2', './models/gpt2',
ignore_patterns=['*.bin', '*.safetensors'])
# Download specific revision
download_model_from_hf('gpt2', './models/gpt2', revision='main')