nat.llm.huggingface_llm#

HuggingFace Transformers LLM Provider - Local in-process model execution.

Attributes#

logger

Classes#

`ModelCacheEntry`
`ModelCache`	Singleton cache for loaded HuggingFace models.
`HuggingFaceConfig`	Configuration for HuggingFace LLM - loads model directly for local execution.

Functions#

`get_cached_model`(→ ModelCacheEntry \| None)	Return cached model data (model, tokenizer, torch) or None if not loaded.
`_cleanup_model`(→ None)	Clean up a loaded model and free GPU memory.
`huggingface_provider`(...)	HuggingFace model provider - loads models locally for in-process execution.

Module Contents#

logger#

class ModelCacheEntry#

model: Any#

tokenizer: Any#

torch: Any#

class ModelCache#

Singleton cache for loaded HuggingFace models.

Models remain cached for the provider’s lifetime (not per-query!) to enable fast reuse: - During nat serve: Cached while server runs, cleaned up on shutdown - During nat red-team: Cached across all evaluation queries, cleaned up when complete - During nat run: Cached for single workflow execution, cleaned up when done

_instance: ModelCache | None = None#

_cache: dict[str, ModelCacheEntry]#

get(model_name: str) → ModelCacheEntry | None#: Return cached model data or None if not loaded.

set(model_name: str, data: ModelCacheEntry) → None#: Cache model data.

remove(model_name: str) → None#: Remove model from cache.

class HuggingFaceConfig(/, **data: Any)#

Bases: nat.data_models.llm.LLMBaseConfig

Configuration for HuggingFace LLM - loads model directly for local execution.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_name: str = None#

device: str = None#

dtype: str | None = None#

max_new_tokens: int = None#

temperature: float = None#

trust_remote_code: bool = None#

get_cached_model(model_name: str) → ModelCacheEntry | None#: Return cached model data (model, tokenizer, torch) or None if not loaded.

async _cleanup_model(model_name: str) → None#

Clean up a loaded model and free GPU memory.

Args:: model_name: Name of the model to clean up.

async huggingface_provider( config: HuggingFaceConfig, builder: nat.builder.builder.Builder, ) → collections.abc.AsyncIterator[nat.builder.llm.LLMProviderInfo]#

HuggingFace model provider - loads models locally for in-process execution.

Args:: config: Configuration for the HuggingFace model. builder: The NAT builder instance.
Yields:: LLMProviderInfo: Provider information for the loaded model.