LLM API Gateway Metrics

View as Markdown

The LLM API Gateway serves Prometheus metrics from llm-api-gateway:9464/metrics when llmApiGateway.metrics.enabled is true.

The self-managed stack maps global.observability.metrics.enabled to this chart value. The gateway metrics use the llm_api_gateway_ service prefix and must not emit legacy service-prefixed metric names.

Label Boundaries

Use bounded labels only. Do not add request IDs, session IDs, function IDs, organization IDs, project IDs, raw URLs, raw prompts, authorization values, or other unbounded request fields as metric labels.

Metrics

Metric nameTypeSource endpointLabelsNotes
llm_api_gateway_http_requests_totalCounterllm-api-gateway:9464/metricsmethod, route, statusTotal inbound HTTP requests. route is the templated route, not the raw URL path.
llm_api_gateway_http_request_duration_secondsHistogramllm-api-gateway:9464/metricsmethod, route, statusInbound HTTP request duration in seconds.
llm_api_gateway_http_active_requestsGaugellm-api-gateway:9464/metricsmethod, routeCurrent in-flight inbound HTTP requests.
llm_api_gateway_upstream_requests_totalCounterllm-api-gateway:9464/metricsupstream, result, statusTotal outbound upstream requests. upstream is a bounded service name such as llm-request-router.
llm_api_gateway_upstream_request_duration_secondsHistogramllm-api-gateway:9464/metricsupstream, result, statusOutbound upstream request duration in seconds.
llm_api_gateway_llm_tokens_totalCounterllm-api-gateway:9464/metricsendpoint, token_type, streamLLM token counts reported by upstream providers. token_type is a bounded enum such as prompt, completion, or total.
llm_api_gateway_provider_time_secondsHistogramllm-api-gateway:9464/metricsendpoint, phase, streamProvider-reported timing phases in seconds.
llm_api_gateway_stream_first_token_secondsHistogramllm-api-gateway:9464/metricsendpointTime from stream request start to first token in seconds.
llm_api_gateway_stream_duration_secondsHistogramllm-api-gateway:9464/metricsendpoint, statusTotal stream duration in seconds.
llm_api_gateway_pubsub_publish_failures_totalCounterllm-api-gateway:9464/metricsNoneNumber of messages that failed to publish.
llm_api_gateway_pubsub_consume_failures_totalCounterllm-api-gateway:9464/metricsNoneNumber of messages that failed to consume.
llm_api_gateway_pubsub_consume_duration_secondsHistogramllm-api-gateway:9464/metricsNoneTime to consume a message in seconds.
llm_api_gateway_rate_limit_event_replication_lag_secondsHistogramllm-api-gateway:9464/metricsNoneLag between rate limit event creation and processing in seconds.
llm_api_gateway_rate_limit_events_received_totalCounterllm-api-gateway:9464/metricsNoneNumber of rate limit events received from the sync transport.
llm_api_gateway_rate_limit_events_dropped_totalCounterllm-api-gateway:9464/metricsreasonNumber of received rate limit events dropped. reason is a bounded enum such as same_cluster, old_message, or remote_apply_disabled.
llm_api_gateway_rate_limit_events_applied_totalCounterllm-api-gateway:9464/metricsNoneNumber of rate limit events applied to the local limiter.
llm_api_gateway_rate_limit_events_failed_apply_totalCounterllm-api-gateway:9464/metricsNoneNumber of rate limit events that failed to apply locally.
llm_api_gateway_rate_limit_events_dry_run_would_apply_totalCounterllm-api-gateway:9464/metricsNoneNumber of rate limit events that would apply when remote application is disabled.
llm_api_gateway_rate_limit_synchronizer_publish_duration_secondsHistogramllm-api-gateway:9464/metricsNoneTime to publish a rate limit event in seconds.
llm_api_gateway_rate_limit_synchronizer_queue_wait_secondsHistogramllm-api-gateway:9464/metricsNoneTime spent queueing a rate limit event in seconds.
llm_api_gateway_rate_limit_synchronizer_queue_lengthGaugellm-api-gateway:9464/metricsNoneCurrent rate limit synchronizer queue length.
llm_api_gateway_rate_limit_synchronizer_events_dropped_totalCounterllm-api-gateway:9464/metricsreasonNumber of rate limit events dropped before publishing. reason is a bounded enum such as old_message.