Observing a Workflow with LangSmith#

This guide provides a step-by-step process to enable observability in a NeMo Agent Toolkit workflow using LangSmith for tracing. By the end of this guide, you will have:

  • Configured telemetry to send OTel traces to LangSmith.

  • Ability to view workflow traces in the LangSmith UI.

  • Understanding of how evaluation and optimization results are tracked as structured experiments.

Prerequisites#

An account on LangSmith is required. You can create an account at LangSmith.

Set your API key as an environment variable:

export LANGSMITH_API_KEY=<your-langsmith-api-key>

Step 1: Install the LangChain Subpackage#

Install the LangChain dependencies (which include LangSmith) to enable tracing capabilities:

uv pip install -e '.[langchain]'

Step 2: Modify Workflow Configuration#

Update your workflow configuration file to include the telemetry settings.

Example configuration:

general:
  telemetry:
    tracing:
      langsmith:
        _type: langsmith
        project: default

This setup enables tracing through LangSmith, with traces grouped into the default project.

Step 3: Run Your Workflow#

From the root directory of the NeMo Agent Toolkit library, install dependencies and run the pre-configured simple_calculator_observability example.

Example:

# Install the workflow and plugins
uv pip install -e examples/observability/simple_calculator_observability/

# Run the workflow with LangSmith telemetry settings
nat run --config_file examples/observability/simple_calculator_observability/configs/config-langsmith.yml --input "What is 2 * 4?"

As the workflow runs, telemetry data will start showing up in LangSmith.

To override the LangSmith project name from the command line without editing the config file, use the --override flag:

nat run --config_file examples/observability/simple_calculator_observability/configs/config-langsmith.yml \
  --override general.telemetry.tracing.langsmith.project <your_project_name> \
  --input "What is 2 * 4?"

The --override flag accepts a dot-notation path into the YAML config hierarchy followed by the new value. It can be specified multiple times to override multiple fields.

Step 4: View Traces in LangSmith#

  • Open your browser and navigate to LangSmith.

  • Locate your workflow traces under your project name in the Projects section.

  • Inspect function execution details, latency, token counts, and other information for individual traces.

Structured Evaluation Experiments#

Note

The nat eval command is provided by the evaluation package. If the command is not available, install the eval extra first:

uv pip install -e '.[eval]'

Or, for a package install:

uv pip install "nvidia-nat[eval]"

For more details, see Agent Evaluation Prerequisites.

LangSmith implements the evaluation callback pattern to create structured experiments in the LangSmith Datasets & Experiments UI. When you run nat eval with LangSmith tracing enabled, the following happens automatically:

  • A Dataset is created from your eval questions (named “Benchmark Dataset (<dataset-name>)”). Each dataset entry becomes a LangSmith example with inputs and expected outputs.

  • An Experiment project (named “<project> (Run #N)”) is linked to the dataset. Each evaluation run increments the run number.

  • Per-example runs are linked to their corresponding dataset examples with evaluator scores attached as feedback on each run.

  • OTel span traces capture each LLM call within each workflow run.

Running an Evaluation with LangSmith#

Use the pre-configured evaluation example:

nat eval --config_file examples/observability/simple_calculator_observability/configs/config-langsmith-eval.yml

This configuration includes both the LangSmith telemetry settings and an evaluation section:

general:
  telemetry:
    tracing:
      langsmith:
        _type: langsmith
        project: nat-eval-demo

eval:
  general:
    max_concurrency: 1
    output_dir: .tmp/nat/examples/langsmith_eval
    dataset:
      _type: json
      file_path: examples/getting_started/simple_calculator/src/nat_simple_calculator/data/simple_calculator.json
  evaluators:
    accuracy:
      _type: tunable_rag_evaluator
      llm_name: eval_llm
      default_scoring: true

After running, check your LangSmith project for:

  • A dataset created from the eval questions.

  • Per-example runs with model answers linked to dataset examples.

  • Evaluator scores as feedback on each run.

  • OTel span traces for each LLM call.

Structured Optimization Experiments#

LangSmith implements the optimization callback pattern to track each optimization trial as a separate experiment. When you run nat optimize with LangSmith tracing enabled, the following happens automatically:

  • A shared Dataset is created for the entire optimization run.

  • Each trial gets its own Experiment project (named “<base> (Run #N, Trial M)”), all linked to the shared dataset. This enables per-trial comparison in the Datasets & Experiments UI.

  • Parameter configurations are recorded as project metadata on each trial.

  • Evaluator scores are attached as feedback per trial.

  • For prompt optimization, prompt versions are pushed to LangSmith prompt repositories with commit tags for each trial (e.g., trial-1, trial-2). The best trial’s prompt is tagged with best.

Running an Optimization with LangSmith#

Use the pre-configured optimization example:

nat optimize --config_file examples/observability/simple_calculator_observability/configs/config-langsmith-optimize.yml

This configuration includes LangSmith telemetry, an evaluation section, and an optimizer section:

general:
  telemetry:
    tracing:
      langsmith:
        _type: langsmith
        project: nat-optimize-demo

eval:
  general:
    max_concurrency: 1
    output_dir: .tmp/nat/examples/langsmith_optimize
    dataset:
      _type: json
      file_path: examples/getting_started/simple_calculator/src/nat_simple_calculator/data/simple_calculator.json
  evaluators:
    accuracy:
      _type: tunable_rag_evaluator
      llm_name: eval_llm
      default_scoring: true

optimizer:
  output_path: .tmp/nat/examples/langsmith_optimize/optimizer
  reps_per_param_set: 1
  eval_metrics:
    accuracy:
      evaluator_name: accuracy
      direction: maximize
  numeric:
    enabled: true
    n_trials: 3
  prompt:
    enabled: false

After running, check your LangSmith project for:

  • Trial runs with parameter configurations recorded as metadata.

  • Feedback scores per trial for each configured metric.

  • OTel span traces for each LLM call within each trial.

Resources#

For more information about LangSmith, view the documentation here.