This page covers operating the function autoscaler after deployment, including health probes, common operational issues, and pointers to the Helm chart values. For log filter syntax, metrics, and traces, see Function Autoscaler Observability.
The function autoscaler exposes three HTTP health endpoints. Their exact paths differ from the rest of the NVCF control plane: liveness and readiness are namespaced under /admin/health/.
The liveness probe deliberately does not check Cassandra or the timeseries database. Restarting the pod when those are unreachable does not help, so the function autoscaler stays running and lets readiness flip instead.
Symptoms: readiness flips to 503, /health reports the cassandra_client component as unhealthy, log lines from rs_autoscaler::cassandra show connection errors.
Checks:
cassandra.ssl. The function autoscaler container expects the cert directory to exist; create /etc/app/config if it is missing.Symptoms: nvcf_autoscaler.timeseries_db.requests_total shows a rising error count, auth_failure_total or server_side_failure_total is non-zero, log lines from rs_autoscaler::timeseries_db show 4xx or 5xx responses.
Checks:
timeseries_db.timeseries_db_url is reachable from the pod.auth_failure_total spikes.Symptoms: nvcf_autoscaler.nvcf_api.request_duration_milliseconds shows a sustained rise in 4xx or 5xx, scaling decisions stop applying.
Checks:
nvcf_api.disable_auth is set as intended for the deployment. Leave it false whenever the NVCF API enforces authentication.Symptoms: the active function set in Cassandra stops growing despite traffic to new functions, nvcf_autoscaler.distributed_lock.acquisition_failures_total is rising across all replicas.
Checks:
locks table for the discovery lock row and its TTL. If the row never expires, the previous leader may have stopped refreshing without releasing it.nvcf_autoscaler.distributed_lock gauge reports the leader state.discovery_lock_duration_seconds.See Architecture for the lock state machine.