Client Mode#
The NeMo Evaluator adapter system supports Client Mode, where adapters run in-process through a custom httpx transport, providing a simpler alternative to the proxy server architecture.
Overview#
Feature |
Server Mode |
Client Mode |
|---|---|---|
Architecture |
Separate proxy server process |
In-process via httpx transport |
Setup |
Automatic server startup/shutdown |
Simple client instantiation |
Use Case |
Framework-driven evaluations |
Direct API usage, notebooks |
Overhead |
Network proxy |
Direct in-process execution |
Debugging |
Separate process |
Same process, easier debugging |
Quick Start#
from nemo_evaluator.client import NeMoEvaluatorClient
from nemo_evaluator.api.api_dataclasses import EndpointModelConfig
from nemo_evaluator.adapters.adapter_config import AdapterConfig, InterceptorConfig
# Configure model and adapters
config = EndpointModelConfig(
model_id="my-model",
url="https://api.example.com/v1/chat/completions",
api_key_name="API_KEY", # Environment variable name
adapter_config=AdapterConfig(
mode="client", # Use client mode (no server)
interceptors=[
InterceptorConfig(name="caching", enabled=True),
InterceptorConfig(name="endpoint", enabled=True),
]
),
is_base_url=False, # True if URL is base, False for complete endpoint
)
# Create client
async with NeMoEvaluatorClient(config, output_dir="./output") as client:
response = await client.chat_completion(
messages=[{"role": "user", "content": "Hello!"}]
)
print(response)
Mode Configuration#
Adapter Mode Field#
The mode field in AdapterConfig controls whether a server process is spawned:
mode="server"(default): Spawns adapter server process inevaluate()callsmode="client": Skips server spawning, for use withNeMoEvaluatorClient
When using NeMoEvaluatorClient directly, set mode="client" to prevent unnecessary server creation if the config is also used in evaluate() calls.
URL Modes#
Client mode supports two URL configurations via the is_base_url flag:
Base URL Mode (is_base_url=True)#
Use when the URL is a base URL and the client should append paths:
config = EndpointModelConfig(
url="https://api.example.com/v1", # Base URL
is_base_url=True,
...
)
# Requests go to: https://api.example.com/v1/chat/completions
Passthrough Mode (is_base_url=False)#
Use when the URL is the complete endpoint:
config = EndpointModelConfig(
url="https://api.example.com/v1/chat/completions", # Complete endpoint
is_base_url=False, # Default
...
)
# Requests go to: https://api.example.com/v1/chat/completions (as-is)
API Reference#
Initialization#
from nemo_evaluator.client import NeMoEvaluatorClient
from nemo_evaluator.api.api_dataclasses import EndpointModelConfig
client = NeMoEvaluatorClient(
endpoint_model_config=EndpointModelConfig(
model_id="model-name",
url="https://api.example.com/v1/chat/completions",
api_key_name="API_KEY",
adapter_config=adapter_config,
is_base_url=False,
temperature=0.7,
top_p=0.9,
max_new_tokens=100,
request_timeout=60,
max_retries=3,
parallelism=5,
),
output_dir="./eval_output"
)
Methods#
Chat Completion#
# Single request (async)
response = await client.chat_completion(
messages=[{"role": "user", "content": "Hello"}],
seed=42 # Optional
)
# Batch requests (sync wrapper)
responses = client.chat_completions(
messages_list=[
[{"role": "user", "content": "Hello"}],
[{"role": "user", "content": "Hi"}],
],
seeds=[42, 43], # Optional
show_progress=True
)
# Batch requests (async)
responses = await client.batch_chat_completions(
messages_list=[...],
seeds=[...],
show_progress=True
)
Text Completion#
# Single completion
response = await client.completion(
prompt="Once upon a time",
seed=42
)
# Batch completions
responses = client.completions(
prompts=["Prompt 1", "Prompt 2"],
seeds=[42, 43],
show_progress=True
)
Embeddings#
# Single embedding
embedding = await client.embedding(text="Hello world")
# Batch embeddings
embeddings = client.embeddings(
texts=["Text 1", "Text 2"],
show_progress=True
)
Context Manager#
# Recommended: ensures post-eval hooks run
async with NeMoEvaluatorClient(config, output_dir="./output") as client:
response = await client.chat_completion(messages=[...])
# Hooks run automatically on exit
Manual Cleanup#
client = NeMoEvaluatorClient(config, output_dir="./output")
try:
response = await client.chat_completion(messages=[...])
finally:
await client.aclose() # Runs post-eval hooks
Adapter Configuration#
Client mode uses the same AdapterConfig as server mode, but with mode="client" to prevent server spawning:
from nemo_evaluator.adapters.adapter_config import AdapterConfig, InterceptorConfig
adapter_config = AdapterConfig(
mode="client", # Prevents adapter server from spawning
interceptors=[
InterceptorConfig(
name="system_message",
config={"system_message": "You are helpful."}
),
InterceptorConfig(name="request_logging"),
InterceptorConfig(
name="caching",
config={"cache_dir": "./cache"}
),
InterceptorConfig(name="reasoning"),
InterceptorConfig(name="endpoint"), # Required
],
post_eval_hooks=[
{"name": "post_eval_report", "config": {"report_types": ["html"]}}
]
)
Note: When using NeMoEvaluatorClient, the mode is automatically set to "client" if not specified.
Implementation Details#
Architecture#
┌─────────────────────────┐
│ Your Script/Notebook │
└───────────┬─────────────┘
│
↓
┌─────────────────────────┐
│ NeMoEvaluatorClient │
│ (AsyncOpenAI wrapper) │
└───────────┬─────────────┘
│
↓
┌─────────────────────────┐
│ AsyncAdapterTransport │
│ (httpx.AsyncBaseTransport) │
│ │
│ ┌───────────────────┐ │
│ │ Adapter Pipeline │ │
│ │ - Interceptors │ │
│ │ - Post-eval hooks │ │
│ └───────────────────┘ │
└───────────┬─────────────┘
│ HTTP
↓
┌─────────────────────────┐
│ Model Endpoint │
└─────────────────────────┘
Request Flow#
User calls
client.chat_completion(...)AsyncOpenAI client constructs httpx.Request
AsyncAdapterTransport intercepts the request
Request wrapped for adapter compatibility (HttpxRequestWrapper)
Request passes through interceptor chain (in thread pool for sync interceptors)
Endpoint interceptor makes HTTP call
Response passes back through response interceptors
Response converted back to httpx.Response
AsyncOpenAI client parses and returns completion
Sync/Async Bridging#
Client mode handles the async/sync boundary automatically:
AsyncAdapterTransport is async (implements
httpx.AsyncBaseTransport)Adapter pipeline and interceptors are synchronous
asyncio.to_thread()runs sync pipeline in thread poolSeamless integration with async OpenAI client
When to Use Client Mode#
Use Client Mode When:#
Writing custom evaluation scripts
Working in Jupyter notebooks
Need direct API control
Want simpler setup
Debugging in same process
Single-process evaluations
Use Server Mode When:#
Running framework-driven evaluations with
evaluate()Need shared adapter state across processes
Working with harnesses that don’t support custom clients
Running distributed evaluations
Examples#
Basic Usage#
from nemo_evaluator.client import NeMoEvaluatorClient
from nemo_evaluator.api.api_dataclasses import EndpointModelConfig
from nemo_evaluator.adapters.adapter_config import AdapterConfig, InterceptorConfig
config = EndpointModelConfig(
model_id="llama-3-70b",
url="https://integrate.api.nvidia.com/v1/chat/completions",
api_key_name="NVIDIA_API_KEY",
is_base_url=False,
adapter_config=AdapterConfig(
interceptors=[
InterceptorConfig(name="caching"),
InterceptorConfig(name="endpoint"),
]
),
)
async with NeMoEvaluatorClient(config, "./output") as client:
response = await client.chat_completion(
messages=[{"role": "user", "content": "What is AI?"}]
)
print(response)
Batch Processing#
# Process multiple prompts with progress bar
prompts = [
[{"role": "user", "content": f"Question {i}"}]
for i in range(100)
]
responses = client.chat_completions(
messages_list=prompts,
show_progress=True
)
With All Interceptors#
adapter_config = AdapterConfig(
interceptors=[
InterceptorConfig(
name="system_message",
config={"system_message": "Be concise."}
),
InterceptorConfig(name="request_logging"),
InterceptorConfig(name="response_logging"),
InterceptorConfig(
name="caching",
config={
"cache_dir": "./cache",
"reuse_cached_responses": True,
"save_requests": True,
"save_responses": True,
}
),
InterceptorConfig(
name="reasoning",
config={"start_reasoning_token": "<think>"}
),
InterceptorConfig(name="response_stats"),
InterceptorConfig(name="endpoint"),
],
post_eval_hooks=[
{"name": "post_eval_report", "config": {"report_types": ["html", "json"]}}
]
)
See Also#
Adapters - Conceptual overview of the adapter system
Configuration - Available interceptors and configuration options
Recipes - Common adapter patterns and recipes