This guide covers metrics, tracing, and visualization for SGLang deployments running through Dynamo.
When running SGLang through Dynamo, SGLang engine metrics are automatically passed through and exposed on Dynamo’s /metrics endpoint (default port 8081). This allows you to access both SGLang engine metrics (prefixed with sglang:) and Dynamo runtime metrics (prefixed with dynamo_*) from a single worker backend endpoint.
For the complete and authoritative list of all SGLang metrics, always refer to the official SGLang Production Metrics documentation.
For Dynamo runtime metrics, see the Dynamo Metrics Guide.
For visualization setup instructions, see the Prometheus and Grafana Setup Guide.
This is a single machine example.
For visualizing metrics with Prometheus and Grafana, start the observability stack. See Observability Getting Started for instructions.
Launch a frontend and SGLang backend to test metrics:
Wait for the SGLang worker to start, then send requests and check metrics:
SGLang exposes metrics in Prometheus Exposition Format text at the /metrics HTTP endpoint. All SGLang engine metrics use the sglang: prefix and include labels (e.g., model_name, engine_type, tp_rank, pp_rank) to identify the source.
Example Prometheus Exposition Format text:
Note: The specific metrics shown above are examples and may vary depending on your SGLang version. Always inspect your actual /metrics endpoint or refer to the official documentation for the current list.
SGLang provides metrics in the following categories (all prefixed with sglang:):
Note: Specific metrics are subject to change between SGLang versions. Always refer to the official documentation or inspect the /metrics endpoint for your SGLang version.
The official SGLang documentation includes complete metric definitions with:
model_name, engine_type, tp_rank, pp_rank)For the complete and authoritative list of all SGLang metrics, see the official SGLang Production Metrics documentation.
prometheus_client.multiprocess.MultiProcessCollectorsglang: prefix before being exposedregister_engine_metrics_callback() functionDynamo propagates W3C Trace Context headers through the SGLang request pipeline, allowing you to correlate traces across the frontend, router, and individual SGLang workers in a disaggregated deployment.
SGLang’s engine-internal tracing requires the opentelemetry packages. These are declared as SGLang’s [tracing] extra. Install them into your Dynamo environment:
Without these packages, Dynamo-side spans (frontend, handler) will still work, but SGLang’s internal engine spans will not be emitted and you will see a warning: "Tracing is disabled because the packages cannot be imported."
Key implementation files:
components/src/dynamo/common/utils/otel_tracing.py - W3C traceparent header buildercomponents/src/dynamo/sglang/request_handlers/handler_base.py:71-84 - Extracts trace context from Dynamo Context objectcomponents/src/dynamo/sglang/request_handlers/llm/decode_handler.py - Passes external_trace_header and rid=trace_id to engine.async_generate()Both flags are required for end-to-end tracing through the SGLang engine. Without --enable-trace, the Dynamo handler still creates spans, but SGLang’s internal engine spans will not be linked.
The disaggregated launch script supports --enable-otel to enable tracing across all components:
Or manually for an aggregated deployment:
With tracing enabled, each inference request produces a single end-to-end trace spanning the full request lifecycle:
http-request span - Root span from the HTTP service, includes method/uri/trace_idkv_router.route_request, kv_router.select_worker, kv_router.compute_block_hashes, kv_router.find_matches, kv_router.compute_seq_hashes, kv_router.schedulehandle_payload span - The Dynamo RPC handler on the worker side, with component/endpoint/namespace labelsReq <id>, Scheduler, Tokenizer, request_process, prefill_forward, decode_loop, Bootstrap Room (for disagg)gen_ai.usage.prompt_tokens, gen_ai.usage.completion_tokens, gen_ai.latency.time_to_first_token, etc.Example trace tree for a KV-routed request:

http://localhost:3000 (username: dynamo, password: dynamo)dynamo-frontend, dynamo-worker-1, sglang)http-request, handle_payload, Req *, decode_loop)rid=<trace-id>, gen_ai.response.model=Qwen/Qwen3-0.6B)Send a request with x-request-id for easy lookup:
For more details on the Tempo/Grafana tracing infrastructure, see the Dynamo Tracing Guide.
Dynamo ships a pre-provisioned Grafana dashboard for SGLang at deploy/observability/grafana_dashboards/sglang.json. It is automatically loaded when the observability stack starts.
The dashboard is organized into five sections:
http://localhost:3000dynamo / dynamoOther available dashboards:
dynamo.json) - Frontend and component metricsdcgm-metrics.json) - GPU utilization, memory, powerkvbm.json) - KV block manager metricsdisagg-dashboard.json) - Disaggregated serving metricsWhen developing on a remote VM (cloud instance, bare metal, etc.), the observability ports are only bound to localhost inside the VM. You have two options to access them.
Forward the relevant ports through your SSH connection. No firewall changes needed, traffic is encrypted.
Then open http://localhost:3000 in your local browser.
For a long-running tunnel in the background:
Open the ports directly. Only use this on trusted networks.
Then access http://<vm-ip>:3000 directly.
For CI pipelines, AI coding agents, or headless workflows where no browser is available, you can query Grafana and Prometheus directly via their APIs:
This is useful for automated benchmarking pipelines where you want to capture metrics programmatically alongside performance results.
dynamo_*) are available at the same /metrics endpoint alongside SGLang metrics
lib/runtime/src/metrics.rs (Rust runtime metrics)lib/runtime/src/metrics/prometheus_names.rs (metric name constants)components/src/dynamo/common/utils/prometheus.py - Prometheus utilities and callback registration