Caching#
The caching interceptor stores and retrieves responses to improve performance, reduce API costs, and enable reproducible evaluations.
Overview#
The CachingInterceptor
implements a sophisticated caching system that can store responses based on request content, enabling faster re-runs of evaluations and reducing costs when using paid APIs.
Configuration#
Interceptor Configuration#
Configure the caching interceptor through the interceptors list in AdapterConfig:
from nemo_evaluator.adapters.adapter_config import AdapterConfig, InterceptorConfig
adapter_config = AdapterConfig(
interceptors=[
InterceptorConfig(
name="caching",
enabled=True,
config={
"cache_dir": "./evaluation_cache",
"reuse_cached_responses": True,
"save_requests": True,
"save_responses": True,
"max_saved_requests": 1000,
"max_saved_responses": 1000
}
)
]
)
CLI Configuration#
--overrides 'target.api_endpoint.adapter_config.interceptors=[{"name":"caching","enabled":true,"config":{"cache_dir":"./cache","reuse_cached_responses":true}}]'
YAML Configuration#
target:
api_endpoint:
adapter_config:
interceptors:
- name: "caching"
enabled: true
config:
cache_dir: "./evaluation_cache"
reuse_cached_responses: true
save_requests: true
save_responses: true
max_saved_requests: 1000
max_saved_responses: 1000
Configuration Options#
Parameter |
Description |
Default |
Type |
---|---|---|---|
|
Directory to store cache files |
|
str |
|
Use cached responses when available |
|
bool |
|
Save requests to cache storage |
|
bool |
|
Save responses to cache storage |
|
bool |
|
Maximum number of requests to save |
|
int | None |
|
Maximum number of responses to cache |
|
int | None |
Cache Key Generation#
The interceptor generates the cache key by creating a SHA256 hash of the JSON-serialized request data using json.dumps()
with sort_keys=True
for consistent ordering.
import hashlib
import json
# Request data
request_data = {
"messages": [{"role": "user", "content": "What is 2+2?"}],
"temperature": 0.0,
"max_new_tokens": 512
}
# Generate cache key
data_str = json.dumps(request_data, sort_keys=True)
cache_key = hashlib.sha256(data_str.encode("utf-8")).hexdigest()
# Result: "abc123def456..." (64-character hex string)
Cache Storage Format#
The caching interceptor stores data in three separate disk-backed key-value stores within the configured cache directory:
Response Cache (
{cache_dir}/responses/
): Stores raw response content (bytes) keyed by cache keyHeaders Cache (
{cache_dir}/headers/
): Stores response headers (dictionary) keyed by cache keyRequest Cache (
{cache_dir}/requests/
): Stores request data (dictionary) keyed by cache key (whensave_requests=True
)
Each cache uses a SHA256 hash of the request data as the lookup key. When a cache hit occurs, the interceptor retrieves both the response content and headers using the same cache key.
Cache Behavior#
Cache Hit Process#
Request arrives at the caching interceptor
Generate cache key from request parameters
Check cache for existing response
Return cached response if found (sets
cache_hit=True
)Skip API call and continue to next interceptor
Cache Miss Process#
Request continues to endpoint interceptor
Response received from model API
Store response in cache with generated key
Continue processing with response interceptors