Distributed tracing | NVIDIA AIStore

AIStore supports distributed tracing via OpenTelemetry (OTEL), enhancing its observability capabilities alongside existing extensive metrics and logging features. Distributed tracing enables tracking client requests across AIStore’s proxy and target daemons, providing better visibility into the request flow and offering valuable performance insights

For more details:

WARNING: Enabling distributed tracing introduces slight overhead in AIStore’s critical data path. Enable this feature only after carefully considering its performance impact and ensuring that the benefits of enhanced observability justify the potential trade-offs.

Getting Started

In this section, we use AIStore Local Playground and local Jaeger. This is done for purely (easy-to-use-and-repropduce) demonsration purposes.

Pre-Requisite

Docker

Local Jaeger setup

1 docker run -d --name jaeger \
2 -e COLLECTOR_OTLP_ENABLED=true \
3 -p 16686:16686 \
4 -p 4317:4317 \
5 -p 4318:4318 \
6 jaegertracing/all-in-one:latest

Optionally, shutdown and cleanup Local Playground:
```
1 make kill clean
```
Deploy the cluster with AuthN enabled:
```
1 AIS_TRACING_ENDPOINT="localhost:4317" make deploy
```
This will start up an AIStore cluster with distributed-tracing enabled.

Example operations

1 ais bucket create ais://nnn
2 ais put README.md ais://nnn
3 ais get ais://nnn/README.md /dev/null

View traces at: http://localhost:16686

Configuration

Cluster-wide tracing configuration. For list of AIStore config options refer to configuration.md.

Option name	Default value	Description
`tracing.enabled`	`false`	If true, enables distributed tracing
`tracing.exporter_endpoint`	`''`	OTEL exporter gRPC endpoint
`tracing.service_name_prefix`	`aistore`	Prefix added to OTEL service name reported by exporter
`tracing.attributes`	`{}`	Extra attributes to be added the traces
`tracing.sampler_probablity`	`1` (export all traces)	Percentage of traces to sample [0,1]
`tracing.skip_verify`	`false`	Allow insecure (TLS) exporter gRPC connection
`tracing.exporter_auth.token_header`	`''`	Request header used for exporter auth token
`tracing.exporter_auth.token_file`	`''`	Filepath to obtain exporter auth token

Sample aistore cluster configuration:

1 {
2     ...
3     "tracing": {
4         "enabled": true,
5         "exporter_endpoint": "localhost:4317",
6         "skip_verify": true,
7         "service_name_prefix": "aistore",
8         "sampler_probability": "1.0"
9     },
10     ...
11 }

Build AIStore with tracing

Distributed tracing is a build-time option controlled using oteltracing build tag.

When aisnode binary is built without this build tag, tracing configuration is ignored and the entire tracing functionality becomes a no-op.

1 # build with tracing support
2 TAGS=oteltracing make node
3 
4 # build without tracing support
5 make node

Table of Contents

Getting Started

Pre-Requisite

Example operations

Configuration

Build AIStore with tracing