Observe Workflows#

The NeMo Agent toolkit uses a flexible, plugin-based observability system that provides comprehensive support for configuring logging, tracing, and metrics for workflows. Users can configure multiple telemetry exporters simultaneously from the available options or create custom integrations. The observability system:

  • Uses an event-driven architecture with IntermediateStepManager publishing workflow events to a reactive stream

  • Supports multiple concurrent telemetry exporters processing events asynchronously

  • Provides built-in exporters for popular observability platforms (Phoenix, Langfuse, Weave, etc.)

  • Enables custom telemetry exporter development for any observability service

These features enable developers to test their workflows locally and integrate observability seamlessly with their preferred monitoring stack.

Compatibility with Previous Versions#

As of v1.2, the span exporter exports attributes names prefixed with nat by default. In prior releases the attribute names were prefixed with aiq, to retain compatibility the NAT_SPAN_PREFIX environment variable can be set to aiq:

export NAT_SPAN_PREFIX=aiq

Installation#

The core observability features (console and file logging) are included by default. For advanced telemetry features like OpenTelemetry and Phoenix tracing, you need to install the optional telemetry extras.

If you have already installed the NeMo Agent toolkit from source, you can install package extras with the following commands:

# Install all optional telemetry extras
uv pip install -e '.[telemetry]'

# Install specific telemetry extras
uv pip install -e '.[data-flywheel]'
uv pip install -e '.[opentelemetry]'
uv pip install -e '.[phoenix]'
uv pip install -e '.[weave]'
uv pip install -e '.[ragaai]'

If you have not installed the NeMo Agent toolkit from source, you can install package extras with the following commands:

# Install all optional telemetry extras
uv pip install "nvidia-nat[telemetry]"

# Install specific telemetry extras
uv pip install "nvidia-nat[data-flywheel]"
uv pip install "nvidia-nat[opentelemetry]"
uv pip install "nvidia-nat[phoenix]"
uv pip install "nvidia-nat[weave]"
uv pip install "nvidia-nat[ragaai]"

Available Tracing Exporters#

The following table lists each exporter with its supported features and configuration guide:

Provider

Integration Documentation

Supported Features

Catalyst

Observing with Catalyst

Logging, Tracing

NVIDIA Data Flywheel Blueprint

Observing with Data Flywheel

Logging, Tracing

DBNL

Observing with DBNL

Logging, Tracing

Dynatrace

Observing with Dynatrace

Logging, Tracing

Galileo

Observing with Galileo

Logging, Tracing

Langfuse

Refer to the examples/observability/simple_calculator_observability example for usage details

Logging, Tracing

LangSmith

Refer to the examples/observability/simple_calculator_observability example for usage details

Logging, Tracing

OpenTelemetry Collector

Observing with OTel Collector

Logging, Tracing

Patronus

Refer to the examples/observability/simple_calculator_observability example for usage details

Logging, Tracing

Phoenix

Observing with Phoenix

Logging, Tracing

W&B Weave

Observing with W&B Weave

Logging, Tracing, W&B Weave Redaction, Evaluation Metrics

Additional options:

  • File Export - Built-in file-based tracing for local development and debugging

  • Custom Exporters - Refer to Adding Telemetry Exporters for creating custom integrations

For complete configuration examples and setup instructions, check the examples/observability/ directory.

Configurable Components#

The flexible observability system is configured using the general.telemetry section in the workflow configuration file. This section contains two subsections: logging and tracing, and each subsection can contain multiple telemetry exporters running simultaneously.

For a complete list of logging and tracing plugins and corresponding configuration settings use the following CLI commands.

# For all registered logging plugins
nat info components -t logging

# For all registered tracing plugins
nat info components -t tracing

Illustrated below is a sample configuration file demonstrating multiple exporters configured to run concurrently.

general:
  telemetry:
    logging:
      console:
        _type: console
        level: WARN
      file:
        _type: file
        path: ./.tmp/workflow.log
        level: DEBUG
    tracing:
      # Multiple exporters can run simultaneously
      phoenix:
        _type: phoenix
        # ... configuration fields
      weave:
        _type: weave
        # ... configuration fields
      file_backup:
        _type: file
        # ... configuration fields

Logging Configuration#

The logging section contains one or more logging providers. Each provider has a _type and optional configuration fields. The following logging providers are supported by default:

  • console: Writes logs to the console.

  • file: Writes logs to a file.

Available log levels:

  • DEBUG: Detailed information for debugging.

  • INFO: General information about the workflow.

  • WARNING: Potential issues that should be addressed.

  • ERROR: Issues that affect the workflow from running correctly.

  • CRITICAL: Severe issues that prevent the workflow from continuing to run.

If a log level is specified, all logs at or above that level will be logged. For example, if the log level is set to WARNING, all logs at or above that level will be logged. If the log level is set to ERROR, all logs at or above that level will be logged.

Tracing Configuration#

The tracing section contains one or more tracing providers. Each provider has a _type and optional configuration fields. The observability system supports multiple concurrent exporters.

NeMo Agent Toolkit Observability Components#

The NeMo Agent toolkit observability system uses a generic, plugin-based architecture built on the Subject-Observer pattern. The system consists of several key components working together to provide comprehensive workflow monitoring:

Event Stream Architecture#

  • IntermediateStepManager: Publishes workflow events (IntermediateStep objects) to a reactive event stream, tracking function execution boundaries, LLM calls, tool usage, and intermediate operations.

  • Event Stream: A reactive stream that broadcasts IntermediateStep events to all subscribed telemetry exporters, enabling real-time observability.

  • Asynchronous Processing: All telemetry exporters process events asynchronously in background tasks, keeping observability “off the hot path” for optimal performance.

Telemetry Exporter Types#

The system supports multiple exporter types, each optimized for different use cases:

  • Raw Exporters: Process IntermediateStep events directly for simple logging, file output, or custom event processing.

  • Span Exporters: Convert events into spans with lifecycle management, ideal for distributed tracing and span-based observability services.

  • OpenTelemetry Exporters: Specialized exporters for OTLP-compatible services with pre-built integrations for popular observability platforms.

  • Advanced Custom Exporters: Support complex business logic, stateful processing, and enterprise reliability patterns with circuit breakers and dead letter queues.

Processing Pipeline System#

Each exporter can optionally include a processing pipeline that transforms, filters, batches, or aggregates data before export:

  • Processors: Modular components for data transformation, filtering, batching, and format conversion.

  • Pipeline Composition: Chain multiple processors together for complex data processing workflows.

  • Type Safety: Generic type system ensures compile-time safety for data transformations through the pipeline.

Integration Components#

  • nat.profiler.decorators: Decorators that wrap workflow and LLM framework context managers to inject usage-collection callbacks.

  • callbacks: Callback handlers that track usage statistics (tokens, time, inputs/outputs) and push them to the event stream. Supports LangChain/LangGraph, LLama Index, CrewAI, Semantic Kernel, and Google ADK frameworks.

Registering a New Telemetry Provider as a Plugin#

For complete information about developing and integrating custom telemetry exporters, including detailed examples, best practices, and advanced configuration options, Refer to Adding Telemetry Exporters.

Provider Integration Guides#

Observing a Workflow with Catalyst

This guide provides a step-by-step process to enable observability in a NeMo Agent toolkit workflow using Catalyst for tracing. By the end of this guide, you will have:

  • Configured telemetry in your workflow.

  • Ability to view traces in the Catalyst platform.

Step 1: Sign up for Catalyst

Step 2: Create a Project

After logging in, create a new project.

  • Project Name: Choose any name.

  • Use Case: Agentic Application

Step 3: Generate API Credentials

Go to your profile settings to generate your:

  • Access Key

  • Secret Key

Step 4: Configure Your Environment

Set the following environment variables in your terminal:

export CATALYST_ACCESS_KEY=<your_access_key>
export CATALYST_SECRET_KEY=<your_secret_key>
export CATALYST_ENDPOINT=https://catalyst.raga.ai/api

Step 5: Install the RagAI Subpackage

uv pip install -e '.[ragaai]'

Step 6: Modify Workflow Configuration

Update your workflow configuration file to include the telemetry settings.

Example configuration:

general:
  telemetry:
    tracing:
      catalyst:
        _type: catalyst
        project: catalyst-demo
        dataset: catalyst-dataset
        tracer_type: my-tracer-type
        endpoint: ${CATALYST_ENDPOINT}
        access_key: ${CATALYST_ACCESS_KEY}
        secret_key: ${CATALYST_SECRET_KEY}

Step 7: Run Your Workflow

From the root directory of the NeMo Agent toolkit library, install dependencies and run the pre-configured simple_calculator_observability example.

Example:

# Install the workflow and plugins
uv pip install -e examples/observability/simple_calculator_observability/

# Run the workflow with Catalyst telemetry settings
# Note, you may have to update configuration settings based on your Catalyst account
nat run --config_file examples/observability/simple_calculator_observability/configs/config-catalyst.yml --input "What is 1*2?"

As the workflow runs, telemetry data will start showing up in Catalyst.

Step 8: View Traces Data in Catalyst

  • Open your browser and navigate to https://catalyst.raga.ai/projects.

  • Locate your workflow traces under your configured project name and dataset.

  • Inspect function execution details, latency, total tokens, request timelines and other info under Info and Attributes tabs of an individual trace.

Catalyst Trace View

Debugging

If you encounter issues while downloading the Catalyst package, try uninstalling and installing:

uv pip uninstall ragaai-catalyst

uv pip install ragaai-catalyst

Observing a Workflow with NVIDIA Data Flywheel

This guide provides a step-by-step process to enable observability in a NVIDIA NeMo Agent toolkit workflow that exports runtime traces to an Elasticsearch instance that is part of the NVIDIA Data Flywheel Blueprint. The Data Flywheel Blueprint can then leverage the traces to fine-tune and evaluate smaller models which can be deployed to replace the original model to reduce latency.

The Data Flywheel integration supports LangChain/LangGraph-based workflows with nim and openai LLM providers and can be enabled with just a few lines of configuration.

Supported Framework and Provider Combinations

The Data Flywheel integration currently supports LangChain (as used in LangChain pipelines and LangGraphs) with the following LLM providers:

  • _type: openai - OpenAI provider

  • _type: nim - NVIDIA NIM provider

The integration captures LLM_START events for completions and tool calls when using these specific combinations. Other framework and provider combinations are not currently supported.

Step 1: Prerequisites

Before using the Data Flywheel integration, ensure you have:

  • NVIDIA Data Flywheel Blueprint deployed and configured

  • Valid Elasticsearch credentials (username and password)

Step 2: Install the Data Flywheel Plugin

To install the Data Flywheel plugin, run the following:

uv pip install -e '.[data-flywheel]'

Step 3: Modify Workflow Configuration

Update your workflow configuration file to include the Data Flywheel telemetry settings:

general:
  telemetry:
    tracing:
      data_flywheel:
        _type: data_flywheel_elasticsearch
        client_id: my_nat_app
        index: flywheel
        endpoint: ${ELASTICSEARCH_ENDPOINT}
        username: elastic
        password: elastic
        batch_size: 10

This configuration enables exporting trace data to NVIDIA Data Flywheel via Elasticsearch.

Configuration Parameters

The Data Flywheel integration supports the following core configuration parameters:

Parameter

Description

Required

Example

client_id

Identifier for your application to distinguish traces between deployments

Yes

"my_nat_app"

index

Elasticsearch index name where traces will be stored

Yes

"flywheel"

endpoint

Elasticsearch endpoint URL

Yes

"https://elasticsearch.example.com:9200"

username

Elasticsearch username for authentication

No

"elastic"

password

Elasticsearch password for authentication

No

"elastic"

batch_size

Size of batch to accumulate before exporting

No

10

Step 4: Run Your Workflow

Run your workflow using the updated configuration file:

nat run --config_file config-data-flywheel.yml --input "Your workflow input here"

Step 5: Monitor Trace Export

As your workflow runs, traces will be automatically exported to Elasticsearch in batches. You can monitor the export process through the NeMo Agent toolkit logs, which will show information about successful exports and any errors.

Step 6: Access Data in Data Flywheel

Once traces are exported to Elasticsearch, they become available in the NVIDIA Data Flywheel system for:

  • LLM distillation and optimization

  • Performance analysis and monitoring

  • Training smaller, more efficient models

  • Runtime optimization insights

Advanced Configuration

Workload Scoping

The Data Flywheel integration uses workload identifiers to organize traces for targeted model optimization. Understanding how to scope your workloads correctly is crucial for effective LLM distillation.

Default Scoping Behavior

By default, each trace receives a Data Flywheel workload_id that maps to the parent NeMo Agent toolkit registered function. The combination of client_id and workload_id is used by Data Flywheel to select data as the basis for training jobs.

Custom Scoping with @track_unregistered_function

For fine-grained optimization, you can create custom workload scopes using the @track_unregistered_function decorator. This is useful when a single registered function contains multiple LLM invocations that would benefit from separate model optimizations.

from nat.profiler.decorators.function_tracking import track_unregistered_function

@track_unregistered_function(name="document_summarizer", metadata={"task_type": "summarization"})
def summarize_document(document: str) -> str:
    return llm_client.complete(f"Summarize: {document}")

@track_unregistered_function(name="question_answerer")
def answer_question(context: str, question: str) -> str:
    return llm_client.complete(f"Context: {context}\nQuestion: {question}")

The decorator supports:

  • name: Custom workload_id (optional, defaults to function name)

  • metadata: Additional context for traces (optional)

Resources

For more information about NVIDIA Data Flywheel:

Observing a Workflow with DBNL

This guide provides a step-by-step process to enable observability in a NeMo Agent toolkit workflow using DBNL for tracing. By the end of this guide, you will have:

  • Configured telemetry in your workflow.

  • Ability to view traces in the DBNL platform.

Step 1: Install DBNL

Visit https://docs.dbnl.com/get-started/quickstart to install DBNL.

Step 2: Create a Project

Create a new Trace Ingestion project in DBNL. To create a new project in DBNL:

  1. Navigate to your DBNL deployment (e.g. http://localhost:8080/)

  2. Go to Projects > + New Project

  3. Name your project nat-calculator

  4. Add a LLM connection to your project

  5. Select Trace Ingestion as the project Data Source

  6. Click on Generate API Token and note down the generated API Token

  7. Note down the Project Id for the project

Step 3: Configure Your Environment

Set the following environment variables in your terminal:

# DBNL_API_URL should point to your deployment API URL (e.g. http://localhost:8080/api)
export DBNL_API_URL=<your_api_url>
export DBNL_API_TOKEN=<your_api_token>
export DBNL_PROJECT_ID=<your_project_id>

Step 4: Install the NeMo Agent toolkit OpenTelemetry Subpackages

# Install specific telemetry extras required for DBNL
uv pip install -e '.[opentelemetry]'

Step 5: Modify NeMo Agent toolkit Workflow Configuration

Update your workflow configuration file to include the telemetry settings.

Example configuration:

general:
  telemetry:
    tracing:
      dbnl:
        _type: dbnl

Step 6: Run the workflow

From the root directory of the NeMo Agent toolkit library, install dependencies and run the pre-configured simple_calculator_observability example.

Example:

# Install the workflow and plugins
uv pip install -e examples/observability/simple_calculator_observability/

# Run the workflow with DBNL telemetry settings
# Note: you may have to update configuration settings based on your DBNL deployment
nat run --config_file examples/observability/simple_calculator_observability/configs/config-dbnl.yml --input "What is 1*2?"

As the workflow runs, telemetry data will start showing up in DBNL.

Step 7: Analyze Traces Data in DBNL

Analyze the traces in DBNL. To analyze traces in DBNL:

  1. Navigate to your DBNL deployment (e.g. http://localhost:8080/)

  2. Go to Projects > nat-calculator

For additional help, see the DBNL docs.

Observing a Workflow with Dynatrace

This guide shows how to stream OpenTelemetry (OTel) traces from your NVIDIA NeMo Agent toolkit workflows to the OpenTelemetry Protocol (OTLP) ingest API, which in turn provides the ability to have full visibility into the performance of LLMs and agent interactions​.

In this guide, you will learn how to:

  • Deploy a Dynatrace OpenTelemetry Collector with a configuration that exports traces into Dynatrace

  • Configure your workflow (YAML) or Python script to send traces to the OTel collector.

  • Run the workflow and view traces within Dynatrace

Step 1: Dynatrace Account

You will need access to your Dynatrace environment. If you don’t have one you can sign up to get one at https://www.dynatrace.com/signup/.

Step 2: Dynatrace API Token

Dynatrace APIs use token based authentication. To generate an access token:

  1. Go to Access tokens.

  2. Select Generate new token.

  3. Enter a name for your token.

  4. Select these required scopes for the OTLP API:

    • openTelemetryTrace.ingest

    • metrics.ingest

    • logs.ingest

  5. Select Generate token.

  6. Copy the generated token to the clipboard. Store the token in a password manager for future use and for the configuration below.

Step 3: Configure OTel Collector

Configure an OTel Collector configuration file using an otlphttp exporter to the Dynatrace OTLP API as shown in the example below. Refer to the Dynatrace documentation as required.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
processors:
  cumulativetodelta:
exporters:
  otlphttp:
    endpoint: "https://<YOUR-DYNATRACE-ENVIRONMENT>.live.dynatrace.com/api/v2/otlp"
    headers:
      Authorization: "Api-Token <YOUR-DYNATRACE-TOKEN>"
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: []
      exporters: [otlphttp]
    metrics:
      receivers: [otlp]
      processors: [cumulativetodelta]
      exporters: [otlphttp]
    logs:
      receivers: [otlp]
      processors: []
      exporters: [otlphttp]

Step 4: Install and run your configured OTel Collector

There are many ways to deploy an OTel Collector but for this example, an OTel Collector is created using Docker with the configuration from the previous step into a file named otelcollectorconfig.yaml with the Dynatrace distribution of the OpenTelemetry Collector.

docker run -d -v "$(pwd)"/otelcollectorconfig.yaml:/etc/otelcol/config.yaml \
-p 4318:4318 \
dynatrace/dynatrace-otel-collector:latest

Once running, the collector endpoint is: http://localhost:4318.

Step 5: Install the NeMo Agent toolkit OpenTelemetry Subpackages

# Install specific telemetry extras required for Dynatrace
uv pip install -e '.[opentelemetry]'

Step 6: Modify NeMo Agent toolkit Workflow Configuration

Update your workflow configuration file to include the telemetry settings.

Example configuration:

general:
  telemetry:
    tracing:
      otelcollector:
        _type: otelcollector
        # The endpoint where you have deployed the otel collector
        endpoint: http://localhost:4318/v1/traces
        project: your_project_name

Step 7: Run the workflow

From the root directory of the NeMo Agent toolkit library, install dependencies and run the pre-configured simple_web_query example.

Example:

# Install the workflow and plugins
uv pip install -e examples/getting_started/simple_web_query

# Run the workflow with OTel+Dynatrace telemetry settings
nat run --config_file examples/getting_started/simple_web_query/configs/config.yml --input "What is LangSmith?" 

As the workflow runs, telemetry data will start showing up in Dynatrace.

Step 8: View spans

View the exported traces within the Dynatrace Distributed Tracing App as shown below.

Dynatrace trace screenshot

Observing a Workflow with Galileo

This guide provides a step-by-step process to enable observability in a NeMo Agent toolkit workflow using Galileo for tracing. By the end of this guide, you will have:

  • Configured telemetry in your workflow.

  • Ability to view traces in the Galileo platform.

Step 1: Sign up for Galileo

Step 2: Create a Project and Log Stream

After logging in:

  • Create a new Logging project (or reuse an existing one).

  • Inside the project create (or locate) the Log Stream you will write to.

Step 3: Generate API Key

Go to Settings → API Keys to generate a new API key and copy it.

You will need the following values:

  • Galileo-API-Key

  • project (project name)

  • logstream (log-stream name)

Step 4: Configure Your Environment

Set the following environment variables in your terminal

export GALILEO_API_KEY=<your_api_key>

Step 5: Install the OpenTelemetry Subpackage

uv pip install '.[opentelemetry]'

Step 6: Modify Workflow Configuration

Update your workflow configuration file to include the telemetry settings.

Example configuration:

general:
  telemetry:
    logging:
      console:
        _type: console
        level: WARN
    tracing:
      galileo:
        _type: galileo
        # Cloud endpoint – change if you are using an on-prem cluster.
        endpoint: https://app.galileo.ai/api/galileo/otel/traces
        project: simple_calculator
        logstream: default
        api_key: ${GALILEO_API_KEY}

Step 7: Run Your Workflow

From the root directory of the NeMo Agent toolkit library, install dependencies and run the pre-configured simple_calculator_observability example.

Example:

# Install the workflow and plugins
uv pip install -e examples/observability/simple_calculator_observability/

# Run the workflow with Galileo telemetry settings
# Note, you may have to update configuration settings based on your Galileo account
nat run --config_file examples/observability/simple_calculator_observability/configs/config-galileo.yml --input "What is 1*2?"

As the workflow runs, telemetry data will start showing up in Galileo.

Step 8: View Traces Data in Galileo

  • Open your browser and navigate to https://app.galileo.ai/.

  • Select your project and navigate to View all logs.

  • Inspect function execution details, latency, total tokens, request timelines and other info within individual traces.

  • New traces should appear within a few seconds.

For additional help, see the Galileo OpenTelemetry integration docs.

Observing a Workflow with OpenTelemetry Collector

This guide shows how to stream OpenTelemetry (OTel) traces from your NeMo Agent toolkit workflows to the generic OTel collector, which in turn provides the ability to export those traces to many different places including file stores (like S3), Datadog, Dynatrace, and others.

In this guide, you will learn how to:

  • Deploy the generic OTel collector with a configuration that saves traces to the local file system. The configuration can be modified to export to other systems.

  • Configure your workflow (YAML) or Python script to send traces to the OTel collector.

  • Run the workflow and view traces in the local file.


Configure and deploy the OTel Collector

  1. Configure the OTel Collector using a otlp receiver and the exporter of your choice. For this example, create a file named otelcollectorconfig.yaml:

    receivers:
      otlp:
        protocols:
          http:
            endpoint: 0.0.0.0:4318
    
    processors:
      batch:
        send_batch_size: 100
        timeout: 10s
    
    exporters:
      file:
        path: /otellogs/llm_spans.json
        format: json
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [file]
    
  2. Install and run your configured OTel Collector noting the endpoint URL such as http://localhost:4318. For this example, run the OTel Collector using Docker and the configuration file from step 1:

    mkdir otellogs
    chmod 777 otellogs
    docker run -v $(pwd)/otelcollectorconfig.yaml:/etc/otelcol-contrib/config.yaml \
      -p 4318:4318 \
      -v $(pwd)/otellogs:/otellogs/ \
      otel/opentelemetry-collector-contrib:0.128.0
    

Install the OpenTelemetry Subpackage

If you installed the NeMo Agent toolkit from source, you can install package extras with the following command:

uv pip install -e '.[opentelemetry]'

If you have not installed the NeMo Agent toolkit from source, you can install package extras with the following command:

uv pip install nvidia-nat[opentelemetry]

Modify Workflow Configuration

Update your workflow configuration file to include the telemetry settings.

Example configuration:

general:
  telemetry:
    tracing:
      otelcollector:
        _type: otelcollector
        # The endpoint where you have deployed the otel collector
        endpoint: http://0.0.0.0:4318/v1/traces
        project: your_project_name

Run the workflow

nat run --config_file <path/to/your/config/file.yml> --input "your notional input"

As the workflow runs, spans are sent to the OTel Collector which in turn exports them based on the exporter you configured. In this example, you can view the exported traces in the local file:

cat otellogs/llm_spans.json

Observing a Workflow with Phoenix

This guide provides a step-by-step process to enable observability in a NeMo Agent toolkit workflow using Phoenix for tracing and logging. By the end of this guide, you will have:

  • Configured telemetry in your workflow.

  • Started the Phoenix server locally.

  • Ability to view traces in the Phoenix UI.

Step 1: Install the Phoenix Subpackage and Phoenix Server

Install the phoenix dependencies to enable tracing capabilities:

uv pip install -e '.[phoenix]'

Then install the Phoenix server:

uv pip install arize-phoenix

Step 2: Start the Phoenix Server

Run the following command to start Phoenix server locally:

phoenix serve

Phoenix should now be accessible at http://0.0.0.0:6006.

Step 3: Modify Workflow Configuration

Update your workflow configuration file to include the telemetry settings.

Example configuration:

general:
  telemetry:
    tracing:
      phoenix:
        _type: phoenix
        endpoint: http://localhost:6006/v1/traces
        project: simple_calculator

This setup enables tracing through Phoenix at http://localhost:6006/v1/traces, with traces grouped into the simple_calculator project.

Step 4: Run Your Workflow

From the root directory of the NeMo Agent toolkit library, install dependencies and run the pre-configured simple_calculator_observability example.

Example:

# Install the workflow and plugins
uv pip install -e examples/observability/simple_calculator_observability/

# Run the workflow with Phoenix telemetry settings
nat run --config_file examples/observability/simple_calculator_observability/configs/config-phoenix.yml --input "What is 1*2?"

As the workflow runs, telemetry data will start showing up in Phoenix.

Step 5: View Traces Data in Phoenix

  • Open your browser and navigate to http://0.0.0.0:6006.

  • Locate your workflow traces under your project name in projects.

  • Inspect function execution details, latency, total tokens, request timelines and other info under Info and Attributes tab of an individual trace.

Debugging

For more Arize-Phoenix details, view the documentation here.

Observing a Workflow with W&B Weave

This guide provides a step-by-step process to enable observability in a NeMo Agent toolkit workflow using Weights and Biases (W&B) Weave for tracing using just a few lines of code in your workflow configuration file.

Weave Tracing Dashboard

Prerequisites

An account on Weights & Biases is required to use Weave.

You can create an account on Weights & Biases by clicking on the “Sign Up” button in the top right corner of the website.

Under the “Account” section, you can find your API key. Click on the “Show” button to reveal the API key. Take note of this API key as you will need it to run the workflow.

export WANDB_API_KEY=<your_api_key>

Step 1: Install the Weave plugin

To install the Weave plugin, run the following:

uv pip install -e '.[weave]'

Step 2: Install the Workflow

Pick an example from the list of available workflows. In this guide, we will be using the simple_calculator example.

uv pip install -e examples/observability/simple_calculator_observability

Step 3: Modify Workflow Configuration

Update your workflow configuration file to include the weave telemetry settings. For example, examples/observability/simple_calculator_observability/configs/config-weave.yml has the following weave settings:

general:
  telemetry:
    tracing:
      weave:
        _type: weave
        project: "nat-demo"

This setup enables logging trace data to W&B weave. The weave integration only requires the project parameter to be set.

Parameter

Description

Example

project

The name of your W&B Weave project

"nat-demo"

entity (deprecated)

Your W&B username or team name

"your-wandb-username-or-teamname"

Step 4: Run Your Workflow

Install simple_calculator example using the instructions in the examples/observability/simple_calculator_observability/README.md guide. Run the workflow using config-weave.yml configuration file:

nat run --config_file examples/observability/simple_calculator_observability/configs/config-weave.yml --input "Is the product of 2 * 4 greater than the current hour of the day?"

If it is your first time running the workflow, you will be prompted to login to W&B Weave.

Step 5: View Traces Data in Weave Dashboard

As the workflow runs, you will find a Weave URL (starting with a 🍩 emoji). Click on the URL to access your logged trace timeline.

Note how the integration captures not only the nat intermediate steps but also the underlying framework. This is because Weave has integrations with many of your favorite frameworks.

Step 6: Redacting Sensitive Data

When tracing LLM workflows, you may be processing sensitive information like personal identifiers, credit card numbers, or API keys. NeMo Agent toolkit Weave integration supports automatic redaction of Personally Identifiable Information (PII) and sensitive keys from your traces.

Prerequisites

To enable PII redaction, you need presidio-analyzer and presidio-anonymizer installed. Installing the weave plugin will install these packages for you.

Enabling PII Redaction

Update your workflow configuration to enable PII redaction:

general:
  telemetry:
    tracing:
      weave:
        _type: weave
        project: "nat-demo"
        redact_pii: true                    # Enable PII redaction
        redact_pii_fields:                  # Optional: specify which entity types to redact
          - EMAIL_ADDRESS
          - PHONE_NUMBER
          - CREDIT_CARD
          - US_SSN
          - PERSON
        redact_keys:                        # Optional: specify additional keys to redact
          - custom_secret
          - api_key
          - auth_token

Redaction Options

The Weave integration supports the following redaction options:

Parameter

Description

Required

redact_pii

Enable PII redaction (true/false)

No (default: false)

redact_pii_fields

List of PII entity types to redact

No (default: all supported entities)

redact_keys

List of additional keys to redact beyond the defaults

No

When redact_pii is enabled, common PII entities like email addresses, phone numbers, credit cards, and more are automatically redacted from your traces before they are sent to Weave. The redact_pii_fields parameter allows you to customize which entity types to redact.

See the Microsoft Presidio documentation for a full list of supported entity types.

Additionally, the redact_keys parameter allows you to specify custom keys that should be redacted beyond the default sensitive keys (api_key, auth_headers, authorization).

User Feedback Integration

When using Weave telemetry with the FastAPI front end, you can enable a /feedback endpoint that allows users to provide thumbs-up and thumbs-down feedback on agent responses. This feedback is linked to specific traces in your Weave project for analysis.

Enabling the Feedback Endpoint

To enable the feedback endpoint, configure your workflow to use the WeaveFastAPIPluginWorker:

general:
  front_end:
    _type: fastapi
    runner_class: nat.plugins.weave.fastapi_plugin_worker.WeaveFastAPIPluginWorker
  telemetry:
    tracing:
      weave:
        _type: weave
        project: "nat-demo"

The WeaveFastAPIPluginWorker registers the /feedback endpoint when Weave telemetry is configured. For more details on the feedback API, see the API Server Endpoints documentation.

Resources

  • Learn more about tracing here.

  • Learn more about how to navigate the logged traces here.

  • Learn more about PII redaction here.