nat.llm.huggingface_llm#

HuggingFace Transformers LLM Provider - Local in-process model execution.

Attributes#

Classes#

ModelCacheEntry

ModelCache

Singleton cache for loaded HuggingFace models.

HuggingFaceConfig

Configuration for HuggingFace LLM - loads model directly for local execution.

Functions#

get_cached_model(→ ModelCacheEntry | None)

Return cached model data (model, tokenizer, torch) or None if not loaded.

_cleanup_model(→ None)

Clean up a loaded model and free GPU memory.

huggingface_provider(...)

HuggingFace model provider - loads models locally for in-process execution.

Module Contents#

logger#
class ModelCacheEntry#
model: Any#
tokenizer: Any#
torch: Any#
class ModelCache#

Singleton cache for loaded HuggingFace models.

Models remain cached for the provider’s lifetime (not per-query!) to enable fast reuse: - During nat serve: Cached while server runs, cleaned up on shutdown - During nat red-team: Cached across all evaluation queries, cleaned up when complete - During nat run: Cached for single workflow execution, cleaned up when done

_instance: ModelCache | None = None#
_cache: dict[str, ModelCacheEntry]#
get(model_name: str) ModelCacheEntry | None#

Return cached model data or None if not loaded.

set(model_name: str, data: ModelCacheEntry) None#

Cache model data.

remove(model_name: str) None#

Remove model from cache.

class HuggingFaceConfig(/, **data: Any)#

Bases: nat.data_models.llm.LLMBaseConfig

Configuration for HuggingFace LLM - loads model directly for local execution.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_name: str = None#
device: str = None#
dtype: str | None = None#
max_new_tokens: int = None#
temperature: float = None#
trust_remote_code: bool = None#
get_cached_model(model_name: str) ModelCacheEntry | None#

Return cached model data (model, tokenizer, torch) or None if not loaded.

async _cleanup_model(model_name: str) None#

Clean up a loaded model and free GPU memory.

Args:

model_name: Name of the model to clean up.

async huggingface_provider(
config: HuggingFaceConfig,
builder: nat.builder.builder.Builder,
) collections.abc.AsyncIterator[nat.builder.llm.LLMProviderInfo]#

HuggingFace model provider - loads models locally for in-process execution.

Args:

config: Configuration for the HuggingFace model. builder: The NAT builder instance.

Yields:

LLMProviderInfo: Provider information for the loaded model.