LLM Request Router Metrics

View as Markdown

The LLM Request Router serves Prometheus metrics from llm-request-router:9090/metrics when llmRequestRouter.metrics.enabled is true.

The self-managed stack maps global.observability.metrics.enabled to this chart value. The request-router chart runs Stargate with --metrics-prefix=llm_request_router_, so deployed metric names use the llm_request_router_ prefix instead of the upstream default stargate_ prefix. The chart also sets the trace service name with --otel-service-name=llm-request-router.

Label Boundaries

Use bounded labels only. Keep routing_key, model, inference_server_id, status, result, and reason to bounded service dimensions. Do not add request IDs, session IDs, function IDs, organization IDs, project IDs, raw URLs, raw prompts, authorization values, or other unbounded request fields as metric labels.

Metrics

Metric nameTypeSource endpointLabelsNotes
llm_request_router_requests_totalCounterllm-request-router:9090/metricsrouting_key, model, inference_server_id, statusTotal proxied requests by selected backend and status.
llm_request_router_proxy_attempts_totalCounterllm-request-router:9090/metricsrouting_key, model, inference_server_id, resultUpstream proxy attempts by selected backend and result.
llm_request_router_proxy_retries_totalCounterllm-request-router:9090/metricsrouting_key, model, reasonTotal proxy retries by retry reason.
llm_request_router_proxy_retry_exhausted_totalCounterllm-request-router:9090/metricsrouting_key, model, reasonTotal requests that exhausted retry options.
llm_request_router_quic_connection_evictions_totalCounterllm-request-router:9090/metricsinference_server_id, reasonTotal QUIC pool evictions by backend and reason.
llm_request_router_quic_hot_path_reconnect_totalCounterllm-request-router:9090/metricsinference_server_id, resultDirect QUIC reconnect attempts from the proxy hot path.
llm_request_router_proxy_replay_buffer_bytesHistogramllm-request-router:9090/metricsmodelProxied request replay buffer size in bytes.
llm_request_router_proxy_duration_secondsHistogramllm-request-router:9090/metricsrouting_key, model, inference_server_idTime to first byte from upstream in seconds.
llm_request_router_routing_duration_secondsHistogramllm-request-router:9090/metricsrouting_key, modelLoad-balancer decision time in seconds.
llm_request_router_active_inference_serversGaugellm-request-router:9090/metricsrouting_key, modelCurrently routable inference servers for a routing target.