Observability#

Tokkio observability stack helps users collect logs, traces, and metrics across the entire system and provides a single pane of glass for data visualization.

The observability stack is an open-source observability stack designed for log aggregation (Loki) and data visualization (Grafana). and distributed tracing (Tempo). It provides a cost-effective and scalable way to monitor, debug, and analyze modern cloud-native applications. Each component serves a specific purpose in the observability pipeline.

Loki (Log Aggregation and Storage)#

  1. Loki is a log aggregation system that efficiently collects, stores, and queries logs from applications.

  2. Loki indexes log metadata (such as labels, timestamps, and sources) to reduce storage costs.

Grafana (Visualization and Monitoring)#

  1. Grafana is a visualization and monitoring tool that provides a unified interface to observe logs, metrics, and traces.

  2. It allows users to build dashboards, set up alerts, and analyze system health in real time.

Tempo (Distributed Tracing Backend)#

  1. Tempo is a distributed tracing system that helps track requests across microservices.

  2. It enables developers to identify performance bottlenecks, latency issues, and errors by following request traces.

  3. Unlike other tracing solutions (e.g., Jaeger, Zipkin), Tempo avoids indexing spans, making it more efficient and cost-effective.

Other Components#

To supplement the LGT stack described in the previous section. Tokkio also deploys the following components:

  1. Node Exporter: Collects host-level metrics such as CPU, memory, disk I/O, network usage, and system load.

  2. Kube-State-Metrics: Collects Kubernetes API state metrics related to Pods, Deployments, Nodes, and other cluster components.

  3. DCGM Exporter: Collects GPU-related metrics from NVIDIA Data Center GPUs using NVIDIA DCGM (Data Center GPU Manager).

  4. Promtail: Collects, processes, and forwards logs to Loki. It is a lightweight log shipper designed to work seamlessly with Loki

  5. OpenTelemetry Collector: Collects, processes, and exports traces from applications into Tempo.

Reference#

Here’s a list of referenced open-source helm charts:

  1. opentelemetry collector

  2. tempo

  3. loki

  4. promtail

  5. kube-prometheus-stack