nat.llm.dynamo_llm#

Dynamo LLM provider with automatic prefix header injection for KV cache optimization.

This module provides a specialized OpenAI-compatible LLM that sends Dynamo prefix headers for optimal KV cache management and request routing. The prefix parameters are optimizable via the NAT optimizer.

The implementation uses httpx event hooks to inject headers at the HTTP transport level, making it framework-agnostic (works with LangChain, LlamaIndex, etc.).

Dynamo Prefix Parameters#

prefix_osl (Output Sequence Length)

Hint for expected response length:

  • LOW: decode_cost=1.0, short responses

  • MEDIUM: decode_cost=2.0, typical responses

  • HIGH: decode_cost=3.0, long responses

prefix_iat (Inter-Arrival Time)

Hint for request pacing:

  • LOW: iat_factor=1.5, rapid bursts -> high worker stickiness

  • MEDIUM: iat_factor=1.0, normal pacing

  • HIGH: iat_factor=0.6, slow requests -> more exploration

prefix_total_requests

Expected requests per conversation:

  • Higher values increase KV cache affinity and worker stickiness

  • Lower values allow more load balancing

Attributes#

Classes#

DynamoPrefixContext

Singleton class for managing Dynamo prefix IDs across LLM calls.

DynamoModelConfig

A Dynamo LLM provider with automatic prefix header injection for KV cache optimization.

Functions#

_create_dynamo_request_hook(...)

Create an httpx event hook that injects Dynamo prefix headers into requests.

create_httpx_client_with_dynamo_hooks(→ httpx.AsyncClient)

Create an httpx.AsyncClient with Dynamo prefix header injection.

dynamo_llm(config, _builder)

Register the Dynamo LLM provider.

Module Contents#

logger#
PrefixLevel#
class DynamoPrefixContext#

Singleton class for managing Dynamo prefix IDs across LLM calls.

This allows evaluation code to set a prefix ID that persists across all LLM calls for a single evaluation question (multi-turn conversation).

Usage:

from nat.llm.dynamo_llm import DynamoPrefixContext

# Set prefix ID at the start of each evaluation question
DynamoPrefixContext.set("eval-q001-abc123")

# ... perform LLM calls ...

# Clear when done
DynamoPrefixContext.clear()

# Or use as a context manager
with DynamoPrefixContext.scope("eval-q001-abc123"):
    # ... perform LLM calls ...
_current_prefix_id: contextvars.ContextVar[str | None]#
classmethod set(prefix_id: str) None#

Set the Dynamo prefix ID for the current context.

Call this at the start of each evaluation question to ensure all LLM calls for that question share the same prefix ID (enabling KV cache reuse).

Args:

prefix_id: The unique prefix ID (e.g., “eval-q001-abc123”)

classmethod clear() None#

Clear the current Dynamo prefix ID context.

classmethod get() str | None#

Get the current Dynamo prefix ID from context, if any.

classmethod scope(prefix_id: str) collections.abc.Iterator[None]#

Context manager for scoped prefix ID usage.

Automatically sets the prefix ID on entry and clears it on exit, ensuring proper cleanup even if exceptions occur.

Args:

prefix_id: The unique prefix ID for this scope

Yields:

None

Usage:
with DynamoPrefixContext.scope(“eval-q001”):

# All LLM calls here will use “eval-q001” prefix await llm.ainvoke(…)

class DynamoModelConfig(/, **data: Any)#

Bases: nat.llm.openai_llm.OpenAIModelConfig

A Dynamo LLM provider with automatic prefix header injection for KV cache optimization.

This is a specialized OpenAI-compatible LLM that sends Dynamo prefix headers for optimal KV cache management and request routing. Prefix headers are enabled by default using the template “nat-dynamo-{uuid}”. The prefix routing parameters (prefix_total_requests, prefix_osl, prefix_iat) are optimizable via the NAT optimizer.

To disable prefix headers, set prefix_template to null/None in your config.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

prefix_template: str | None = None#
prefix_total_requests: int = None#
prefix_osl: PrefixLevel = None#
prefix_iat: PrefixLevel = None#
request_timeout: float = None#
static get_dynamo_field_names() frozenset[str]#

Get the set of Dynamo-specific field names for model_dump exclusion.

Use this when building config dicts for framework clients to exclude Dynamo-specific parameters that should not be passed to the underlying client.

Returns:

A frozenset of Dynamo-specific field names.

Example:

config_dict = config.model_dump(
    exclude={"type", "thinking", *DynamoModelConfig.get_dynamo_field_names()},
    ...
)
_create_dynamo_request_hook(
prefix_template: str | None,
total_requests: int,
osl: str,
iat: str,
) collections.abc.Callable[[httpx.Request], collections.abc.Coroutine[Any, Any, None]]#

Create an httpx event hook that injects Dynamo prefix headers into requests.

This hook is called before each HTTP request is sent, allowing us to inject headers dynamically. The prefix ID is generated ONCE when the hook is created, ensuring all requests from the same client share the same prefix ID. This enables Dynamo’s KV cache optimization across multi-turn conversations.

The context variable can override this for scenarios where you need different prefix IDs (e.g., per-question in batch evaluation).

Args:

prefix_template: Template string with {uuid} placeholder total_requests: Expected number of requests for this prefix osl: Output sequence length hint (LOW/MEDIUM/HIGH) iat: Inter-arrival time hint (LOW/MEDIUM/HIGH)

Returns:

An async function suitable for use as an httpx event hook.

create_httpx_client_with_dynamo_hooks(
prefix_template: str | None,
total_requests: int,
osl: str,
iat: str,
timeout: float = 600.0,
) httpx.AsyncClient#

Create an httpx.AsyncClient with Dynamo prefix header injection.

This client can be passed to the OpenAI SDK to inject headers at the HTTP level, making it framework-agnostic.

Args:

prefix_template: Template string with {uuid} placeholder total_requests: Expected number of requests for this prefix osl: Output sequence length hint (LOW/MEDIUM/HIGH) iat: Inter-arrival time hint (LOW/MEDIUM/HIGH) timeout: HTTP request timeout in seconds

Returns:

An httpx.AsyncClient configured with Dynamo header injection.

async dynamo_llm(
config: DynamoModelConfig,
_builder: nat.builder.builder.Builder,
)#

Register the Dynamo LLM provider.