Observability
This guide covers the steps to enable observability components such as Prometheus, Grafana, and kube-state-metrics for the operator. By default, these components are disabled in the Helm chart and need to be enabled explicitly.
Purpose of kube-state-metrics
kube-state-metrics
is a service that listens to the Kubernetes API server and generates metrics about the state of the objects (such as deployments, nodes, pods, etc.) managed by Kubernetes. In the context of our operator, we deploy kube-state-metrics
to expose detailed metrics for the custom resource definitions (CRDs) managed by the operator.
Enabling Observability Components
To enable Prometheus, Grafana and Kube-State-Metrics, you have to install the helm charts as described in the Helm Prerequisites guide.
Included Grafana Dashboards
We have preconfigured three Grafana dashboards to provide monitoring and insights into the operator and its controllers:
DOCA Platform Framework State: This dashboard provides a high-level overview of the operator and its controllers, highlighting key metrics such as resource status, condition states, and time to readiness.
Controller Runtime Dashboard: This dashboard provides detailed metrics and visualizations for the controllers, including information on reconciliation times, queue depths, and error rates.
Kubernetes API Server Requests Dashboard: This dashboard monitors the requests made to the Kubernetes API server, helping you to identify any performance bottlenecks or excessive API usage.
These dashboards are automatically deployed when Grafana is enabled. Once enabled, you can access them through the Grafana web UI under the "Dashboards" section.
Setting the Grafana Admin Password
You can set the Grafana admin password manually by configuring the grafana.adminPassword
value in Grafana values file:
adminPassword: <your-password>
Alternatively, if you prefer Grafana to generate a custom password, leave adminPassword
unset. After deployment, you can retrieve the autogenerated password using the following command:
kubectl -n dpf-operator-system get secret dpf-operator-grafana -ojsonpath='{.data.admin-password}'
| base64 -d
This command fetches the password from the Kubernetes secret created for Grafana.
Note on Storage Solution
By default, Grafana and Prometheus use hostPath
for storage. This is not recommended for production environments due to the potential for data loss and lack of scalability. You should configure a more reliable storage solution.
To change the storage solution, you need to modify the respective storage configurations in the values.yaml
file:
Prometheus:
server:
persistentVolume:
enabled: true
storageClass: <your-storage-class
>
Grafana:
persistence:
enabled: true
storageClassName: <your-storage-class
>
Make sure to replace <your-storage-class>
with the appropriate storage class for your environment.