Response Stats Interceptor#

Overview#

The ResponseStatsInterceptor collects comprehensive aggregated statistics from API responses for metrics collection and analysis. It tracks detailed metrics about token usage, response patterns, performance characteristics, and API behavior throughout the evaluation process.

This interceptor is essential for understanding API performance, cost analysis, and monitoring evaluation runs. It provides both real-time aggregated statistics and detailed per-request tracking capabilities.

Key Statistics Tracked:

Token usage (prompt, completion, total) with averages and maximums
Response status codes and counts
Finish reasons and stop reasons
Tool calls and function calls counts
Response latency (average and maximum)
Total response count and successful responses
Inference run times and timing analysis

Configuration#

CLI Configuration#

--overrides 'target.api_endpoint.adapter_config.tracking_requests_stats=True,target.api_endpoint.adapter_config.response_stats_cache=/tmp/response_stats_interceptor,target.api_endpoint.adapter_config.logging_aggregated_stats_interval=100'

YAML Configuration#

target:
  api_endpoint:
    adapter_config:
      interceptors:
        - name: "response_stats"
          enabled: true
          config:
            # Default configuration - collect all statistics
            collect_token_stats: true
            collect_finish_reasons: true
            collect_tool_calls: true
            save_individuals: true
            cache_dir: "/tmp/response_stats_interceptor"
            logging_aggregated_stats_interval: 100
        - name: "endpoint"
          enabled: true
          config: {}

target:
  api_endpoint:
    adapter_config:
      interceptors:
        - name: "response_stats"
          enabled: true
          config:
            # Minimal configuration - only basic stats
            collect_token_stats: false
            collect_finish_reasons: false
            collect_tool_calls: false
            save_individuals: false
            logging_aggregated_stats_interval: 50
        - name: "endpoint"
          enabled: true
          config: {}

target:
  api_endpoint:
    adapter_config:
      interceptors:
        - name: "endpoint"
          enabled: true
          config: {}
        - name: "response_stats"
          enabled: true
          config:
            # Custom configuration with periodic saving
            collect_token_stats: true
            collect_finish_reasons: true
            collect_tool_calls: true
            stats_file_saving_interval: 100
            save_individuals: true
            cache_dir: "/custom/stats/cache"
            logging_aggregated_stats_interval: 25

Configuration Options#

For detailed configuration options, please refer to the nemo_evaluator.adapters.interceptors Python API reference.

Behavior#

Statistics Collection#

The interceptor automatically collects statistics from successful API responses (HTTP 200) and tracks basic information for all responses regardless of status code.

For Successful Responses (200):

Parses JSON response body
Extracts token usage from usage field
Collects finish reasons from choices[].finish_reason
Counts tool calls and function calls
Calculates running averages and maximums

For All Responses:

Tracks status code distribution
Measures response latency
Records response timestamps
Maintains response counts

Data Storage#

Aggregated Stats: Continuously updated running statistics stored in cache
Individual Stats: Per-request details stored with request IDs (if enabled)
Metrics File: Final statistics saved to eval_factory_metrics.json
Thread Safety: All operations are thread-safe using locks

Timing Analysis#

Tracks inference run times across multiple evaluation runs
Calculates time from first to last request per run
Estimates time to first request from adapter initialization
Provides detailed timing breakdowns for performance analysis

Statistics Output#

Aggregated Statistics#

{
  "response_stats": {
    "description": "Response statistics saved during processing",
    "avg_prompt_tokens": 150.5,
    "avg_total_tokens": 200.3,
    "avg_completion_tokens": 49.8,
    "avg_latency_ms": 1250.2,
    "max_prompt_tokens": 300,
    "max_total_tokens": 450,
    "max_completion_tokens": 150,
    "max_latency_ms": 3000,
    "count": 1000,
    "successful_count": 995,
    "tool_calls_count": 50,
    "function_calls_count": 25,
    "finish_reason": {
      "stop": 800,
      "length": 150,
      "tool_calls": 45
    },
    "status_codes": {
      "200": 995,
      "429": 3,
      "500": 2
    },
    "inference_time": 45.6,
    "run_id": 0
  }
}

Individual Request Statistics (if enabled)#

{
  "request_id": "req_123",
  "timestamp": 1698765432.123,
  "status_code": 200,
  "prompt_tokens": 150,
  "total_tokens": 200,
  "completion_tokens": 50,
  "finish_reason": "stop",
  "tool_calls_count": 0,
  "function_calls_count": 0,
  "run_id": 0
}

Common Use Cases#

Cost Analysis: Track token usage patterns to estimate API costs
Performance Monitoring: Monitor response times and throughput
Quality Assessment: Analyze finish reasons and response patterns
Tool Usage Analysis: Track function and tool call frequencies
Debugging: Individual request tracking for troubleshooting
Capacity Planning: Understand API usage patterns and limits
A/B Testing: Compare statistics across different configurations
Production Monitoring: Real-time visibility into API behavior

Integration Notes#

Post-Evaluation Hook: Automatically saves final statistics after evaluation completes
Cache Persistence: Statistics survive across runs and can be aggregated
Thread Safety: Safe for concurrent request processing
Memory Efficient: Uses running averages to avoid storing all individual values
Caching Strategy: Handles cache hits by skipping statistics collection to avoid double-counting