AIStore supports distributed tracing via OpenTelemetry (OTEL), enhancing its observability capabilities alongside existing extensive metrics and logging features. Distributed tracing enables tracking client requests across AIStore’s proxy and target daemons, providing better visibility into the request flow and offering valuable performance insights
For more details:
WARNING: Enabling distributed tracing introduces slight overhead in AIStore’s critical data path. Enable this feature only after carefully considering its performance impact and ensuring that the benefits of enhanced observability justify the potential trade-offs.
In this section, we use AIStore Local Playground and local Jaeger. This is done for purely (easy-to-use-and-repropduce) demonsration purposes.
Pre-Requisite
- Docker
Local Jaeger setup
Optionally, shutdown and cleanup Local Playground:
Deploy the cluster with AuthN enabled:
This will start up an AIStore cluster with distributed-tracing enabled.
View traces at: http://localhost:16686
Cluster-wide tracing configuration. For list of AIStore config options refer to configuration.md.
Sample aistore cluster configuration:
Distributed tracing is a build-time option controlled using oteltracing build tag.
When aisnode binary is built without this build tag, tracing configuration is ignored and the entire tracing functionality becomes a no-op.