Setting up Prometheus

Implementing a Prometheus stack can be complicated but can be managed by taking advantage of the Helm package manager and the Prometheus Operator and kube-prometheus projects. The Operator uses standard configurations and dashboards for Prometheus and Grafana and the Helm prometheus-operator chart allows you to get a full cluster monitoring solution up and running by installing Prometheus Operator and the rest of the components listed above.

First, add the helm repo:

$ helm repo add prometheus-community \
   https://prometheus-community.github.io/helm-charts

Now, search for the available prometheus charts:

$ helm search repo kube-prometheus

Once you’ve located which the version of the chart to use, inspect the chart so we can modify the settings:

$ helm inspect values prometheus-community/kube-prometheus-stack > /tmp/kube-prometheus-stack.values

Next, we’ll need to edit the values file to change the port at which the Prometheus server service is available. In the prometheus instance section of the chart, change the service type from ClusterIP to NodePort. This will allow the Prometheus server to be accessible at your machine ip address at port 30090 as http://<machine-ip>:30090/

From:
 ## Port to expose on each node
 ## Only used if service.type is 'NodePort'
 ##
 nodePort: 30090

 ## Loadbalancer IP
 ## Only use if service.type is "loadbalancer"
 loadBalancerIP: ""
 loadBalancerSourceRanges: []
 ## Service type
 ##
 type: ClusterIP

To:
 ## Port to expose on each node
 ## Only used if service.type is 'NodePort'
 ##
 nodePort: 30090

 ## Loadbalancer IP
 ## Only use if service.type is "loadbalancer"
 loadBalancerIP: ""
 loadBalancerSourceRanges: []
 ## Service type
 ##
 type: NodePort

Also, modify the prometheusSpec.serviceMonitorSelectorNilUsesHelmValues settings to false below:

## If true, a nil or {} value for prometheus.prometheusSpec.serviceMonitorSelector will cause the
## prometheus resource to be created with selectors based on values in the helm deployment,
## which will also match the servicemonitors created
##
serviceMonitorSelectorNilUsesHelmValues: false

Add the following configMap to the section on additionalScrapeConfigs in the Helm chart:

## AdditionalScrapeConfigs allows specifying additional Prometheus scrape configurations. Scrape configurations
## are appended to the configurations generated by the Prometheus Operator. Job configurations must have the form
## as specified in the official Prometheus documentation:
## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config. As scrape configs are
## appended, the user is responsible to make sure it is valid. Note that using this feature may expose the possibility
## to break upgrades of Prometheus. It is advised to review Prometheus release notes to ensure that no incompatible
## scrape configs are going to break Prometheus after the upgrade.
##
## The scrape configuration example below will find master nodes, provided they have the name .*mst.*, relabel the
## port to 2379 and allow etcd scraping provided it is running on all Kubernetes master nodes
##
additionalScrapeConfigs:
- job_name: gpu-metrics
  scrape_interval: 1s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - gpu-operator-resources
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_node_name]
    action: replace
    target_label: kubernetes_node

Finally, we can deploy the Prometheus and Grafana pods using the kube-prometheus-stack via Helm:

$ helm install prometheus-community/kube-prometheus-stack \
   --create-namespace --namespace prometheus \
   --generate-name \
   --values /tmp/kube-prometheus-stack.values

Note

You can also override values in the Prometheus chart directly on the Helm command line:

$ helm install prometheus-community/kube-prometheus-stack \
   --create-namespace --namespace prometheus \
   --generate-name \
   --set prometheus.service.type=NodePort \
   --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

You should see a console output as below:

NAME: kube-prometheus-stack-1603211794
LAST DEPLOYED: Tue Oct 20 16:36:39 2020
NAMESPACE: prometheus
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace prometheus get pods -l "release=kube-prometheus-stack-1603211794"

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.