For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Overview
    • Quickstart
  • Before You Deploy
    • Infrastructure Sizing
    • Manifest
  • Deployment
    • Installation Overview
    • Image Mirroring
    • Helmfile Installation
  • GPU Cluster Setup
    • GPU Cluster Setup
    • Self-Managed Clusters
  • Configuration
    • Optional Enhancements
    • LLM Function Enablement
    • Gateway Routing
    • Third-Party Registries
    • Registry Allowlist
    • Cluster Configuration
    • KAI Scheduler
  • Using Cloud Functions
    • API
    • Service Keys
    • Function Creation
    • LLM Gateway
    • Generic HTTP Function Invocation
    • gRPC Function Invocation
    • Container Functions
    • Helm Functions
    • Streaming Functions
    • Configure Autoscaling
    • CLI
  • Function Autoscaling
    • Function Autoscaling Overview
    • Architecture
    • Operations
    • Observability
  • Observability
    • Observability
    • Example Dashboards
  • Operations
    • Control Plane Operations
    • Cluster Monitoring
    • Troubleshooting
  • Runbooks
    • Runbooks
    • Key Rotation
  • Reference
    • Cluster Reference
    • gRPC Load Testing
    • gRPC Load Test SLI Guide
    • HTTP Load Testing
    • HTTP Load Test SLI Guide
    • HTTP Soak Testing
  • Development
    • Architecture Overview
    • Local Development
    • Fake GPU Operator
    • Release Process
  • Managed (Legacy)
    • Function Lifecycle
    • Observability
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoCloud Functions
On this page
  • Metric reference
  • Tracing
  • Logging
  • See also
Function Autoscaling

Function Autoscaler Observability

||View as Markdown|
Previous

Operations

Next

Observability

The function autoscaler emits structured logs, Prometheus metrics that explain dependency health statuses and scaling decisions, and OpenTelemetry spans for outbound calls to its dependencies. The Prometheus exporter serves metrics on the address configured in server.metrics.exporters. The local settings file at crates/server/resources/settings-local.yaml uses 0.0.0.0:41338.

Job and namespace labels follow the standard NVCF naming convention for the cluster that runs the function autoscaler.

Metric reference

Metric nameMetric typeDescription
nvcf_autoscaler.autoscaling.statusGaugeScaling status per function, encoded as a reason code.
nvcf_autoscaler.scaling.current_instancesGaugeCurrent instance count per function as read from the timeseries database.
nvcf_autoscaler.scaling.desired_instancesGaugeDesired instance count computed by the scaling decision.
nvcf_autoscaler.scaling.utilizationGaugeUtilization percentage per function used in the scaling decision.
nvcf_autoscaler.requests.queued_totalCounterScaling requests queued for processing.
nvcf_autoscaler.requests.processed_totalCounterScaling requests processed.
nvcf_autoscaler.requests.rejected_totalCounterScaling requests rejected by the policy or guard rails.
nvcf_autoscaler.requests.rate_limited_totalCounterScaling requests rate-limited downstream.
nvcf_autoscaler.queue.sizeGaugeCurrent depth of the scaling work queue.
nvcf_autoscaler.queue.capacityGaugeConfigured capacity of the scaling work queue.
nvcf_autoscaler.function_table_stateGaugeState of the active function table entry per function.
nvcf_autoscaler.function_discovery_duration_secondsHistogramDuration of each discovery loop run.
nvcf_autoscaler.timeseries_db.requests_totalCounterTimeseries database requests, labeled by status.
nvcf_autoscaler.timeseries_db.request_duration_millisecondsHistogramTimeseries database request latency.
nvcf_autoscaler.timeseries_db.auth_failure_totalCounterTimeseries database authentication failures.
nvcf_autoscaler.timeseries_db.server_side_failure_totalCounterTimeseries database server-side query failures.
nvcf_autoscaler.nvcf_api.request_duration_millisecondsHistogramNVCF API request latency.
nvcf_autoscaler.oauth2_api.request_duration_millisecondsHistogramOAuth2 token endpoint request latency.
nvcf_autoscaler.oauth2_client.token_refresh_failure_totalCounterOAuth2 client token refresh failures.
nvcf_autoscaler.cassandra.health_statusGaugeCassandra client health. 1 indicates healthy, 0 indicates unhealthy.
nvcf_autoscaler.health.overall_statusGaugeOverall service health status.
nvcf_autoscaler.health.component_statusGaugePer-component health status.
nvcf_autoscaler.distributed_lockGaugeState of the discovery distributed lock for this replica.
nvcf_autoscaler.distributed_lock.acquisition_failures_totalCounterDiscovery lock acquisition failures.
nvcf_autoscaler.processing.utilization_data_age_millisecondsHistogramAge of the utilization data used in each scaling decision.

Tracing

The function autoscaler emits OpenTelemetry spans for outbound calls to the timeseries database and the NVCF API, with the OTLP endpoint and span filter configurable under server.tracing.

Logging

The function autoscaler writes structured logs to stdout. Set log filter directives in the server.envfilter_directive configuration field. The format follows the tracing_subscriber env filter syntax (Rust ecosystem standard):

1server:
2 envfilter_directive: "server=info,rs_autoscaler=debug,rs_autoscaler::cassandra=warn,info"

The same syntax applies to server.tracing.logging_envfilter_directive if you separate logging and tracing filters.

Useful target prefixes:

TargetCovers
serverBinary entry point: startup, server lifecycle.
rs_autoscalerTop-level function autoscaler library crate.
rs_autoscaler::workScaling loop, discovery loop, bucket reshuffles.
rs_autoscaler::cassandraCassandra client, LWT lock operations.
rs_autoscaler::nvcf_apiOAuth2, NVCF API calls.
rs_autoscaler::timeseries_dbTimeseries database query traces.

See also

  • Function Autoscaler Operations for common symptoms tied to these metrics and log lines.
  • Architecture for the components that emit each signal.
  • Configure Autoscaling for setting per-function scaling bounds and policy via the NVCF API.