Step #2: Install Prometheus

Prometheus is an open source cluster monitoring service that TMS utilizes to pull metrics used for autoscaling. Before deploying TMS to our cluster, we will install Prometheus.

Note

In production environments, you should follow the latest instructions for installing, configuring, and securing Prometheus as provided by the developer of these tools.

  1. Open the SSH Console from the left pane, and install Prometheus via Helm

    Copy
    Copied!
                

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install -n monitoring --create-namespace prometheus prometheus-community/kube-prometheus-stack

    Verify that the installation was successful and verify that the Prometheus pods are running and healthy (they may take a bit of time to start).

    Copy
    Copied!
                

    $ kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 20s prometheus-grafana-5c4cf95bbc-cjxj8 3/3 Running 0 25s prometheus-kube-prometheus-operator-dfc8bb5c5-sd2vz 1/1 Running 0 25s prometheus-kube-state-metrics-6df4697c45-2kdpf 1/1 Running 0 25s prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 20s prometheus-prometheus-node-exporter-7kvb6 1/1 Running 0 25s


  2. Next we will install the Prometheus Metrics Adapter, which will allow us TMS to use custom metrics for autoscaling:

    Copy
    Copied!
                

    helm install -n monitoring prometheus-adapter prometheus-community/prometheus-adapter --set=prometheus.url=http://prometheus-kube-prometheus-prometheus

    If everything is installed successfully, Prometheus should start collecting metrics from the cluster within a few minutes. You can verify this by getting metrics from the Kubernetes custom metrics API.

    Copy
    Copied!
                

    kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq | less

    You should see an output that looks similar to the one below. The actual entries don’t matter so long as there are some entries.

    Copy
    Copied!
                

    { "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "custom.metrics.k8s.io/v1beta1", "resources": [ { "name": "services/node_memory_KReclaimable_bytes", "singularName": "", "namespaced": true, "kind": "MetricValueList", "verbs": [ "get" ] }, { "name": "services/prometheus_remote_storage_string_interner_zero_reference_releases", "singularName": "", "namespaced": true, "kind": "MetricValueList", "verbs": [ "get" ] } ] }


© Copyright 2022-2023, NVIDIA. Last updated on Sep 29, 2023.