Metrics Overview | NVIDIA Cloud Functions

Per-service metrics reference for the NVCF self-hosted control plane. Each linked page lists metric names, types, sources, descriptions, and the labels and filters that make the metric useful in queries and dashboards.

Control plane services

NVCF API: request rates, response status codes, and log event counts for the NVCF API service.
Invocation Service: HTTP request counts, durations, and invocation error metrics for the invocation path.
ESS: template rendering counters and HTTP client metrics for the Encrypted Secrets Service.
gRPC Proxy: client connection counts, NATS pipe health, gRPC worker session-attach latency, and HTTP RED metrics for the gRPC proxy.
State Metrics Service: per-function instance count, stage durations, request latency, and function metadata.
SIS/Spot: HTTP client metrics for the Spot Instance Service.
Function Autoscaler: OpenTelemetry metrics emitted by the function autoscaler service.

LLM services

LLM API Gateway: request and routing metrics for the LLM API gateway.
LLM Function Invocation Metrics Report: end-to-end LLM invocation path report.
LLM Request Router: request router metrics for LLM traffic.

Per-function containers

Init Container: restart counts and termination reasons for function init containers.
Utils Container: restart counts, termination reasons, and worker service response metrics for function utils containers.

Datastores

Cassandra: client request latency, timeouts, authentication failures, and endpoint connection metrics.
Vault/OpenBao: pointer to upstream OpenBao telemetry documentation.

Control plane services

LLM services

Per-function containers

Datastores

See also