Observing a Workflow with OpenTelemetry Collector#

This guide shows how to stream OpenTelemetry (OTel) traces from your NeMo Agent toolkit workflows to the generic OTel collector, which in turn provides the ability to export those traces to many different places including file stores (like S3), Datadog, Dynatrace, and others.

In this guide, you will learn how to:

Deploy the generic OTel collector with a configuration that saves traces to the local file system. The configuration can be modified to export to other systems.
Configure your workflow (YAML) or Python script to send traces to the OTel collector.
Run the workflow and view traces in the local file.

Configure and deploy the OTel Collector#

Configure the OTel Collector using a otlp receiver and the exporter of your choice. For this example, create a file named otelcollectorconfig.yaml:

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    send_batch_size: 100
    timeout: 10s

exporters:
  file:
    path: ./.tmp/llm_spans.json
    format: json

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [file]

Install and run your configured OTel Collector noting the endpoint URL such as http://localhost:4318. For this example, run the OTel Collector using Docker and the configuration file from step 1:

mkdir otellogs
docker run -v $(pwd)/otelcollectorconfig.yaml:/etc/otelcol-contrib/config.yaml \
  -p 4318:4318 \
  -v $(pwd)/otellogs:/tmp/ \
  otel/opentelemetry-collector-contrib:0.128.0

Install the OpenTelemetry Subpackage#

uv pip install -e '.[opentelemetry]'

Modify Workflow Configuration#

Update your workflow configuration file to include the telemetry settings.

Example configuration:

general:
  telemetry:
    tracing:
      otelcollector:
        _type: otelcollector
        # The endpoint where you have deployed the otel collector
        endpoint: http://0.0.0.0:4318/v1/traces
        project: your_project_name

Run the workflow#

# ensure you have installed nvidia-nat with telemetry, eg uv pip install -e '.[telemetry]'
uv pip install -e <path/to/your/workflow/root>
nat run --config_file <path/to/your/config/file.yml> --input "your notional input"

As the workflow runs, spans are sent to the OTel Collector which in turn exports them based on the exporter you configured. In this example, you can view the exported traces in the local file:

cat otellogs/llm_spans.json