nat.plugins.langchain.langsmith.langsmith_optimization_callback#

Attributes#

Classes#

LangSmithOptimizationCallback

Per-trial experiment projects with OTEL trace linking and prompt management.

Module Contents#

logger#
class LangSmithOptimizationCallback(
*,
project: str,
experiment_prefix: str = 'NAT',
dataset_name: str | None = None,
)#

Per-trial experiment projects with OTEL trace linking and prompt management.

Each optimizer trial gets its own experiment project linked to a shared dataset. OTEL traces are routed to per-trial projects via get_trial_project_name(), which also pre-creates the project with reference_dataset_id. After eval, OTEL runs are retroactively linked to dataset examples with feedback and parameter metadata.

needs_root_span_ids = True#
_client#
_project#
_experiment_prefix = 'NAT'#
_dataset_name_hint = None#
_dataset_id: str | None = None#
_dataset_name: str | None = None#
_run_number: int | None = None#
_example_ids: dict[Any, str]#
_prompt_commit_urls: dict[tuple[str, int], str]#
_prompt_repo_names: dict[str, str]#
_prompt_trial_counter: int = 0#
_prompt_param_names: list[str] = []#
set_prompt_param_names(names: list[str]) None#
_build_base_name() str#

Build the base name used for datasets and run numbering.

Format: Optimization Benchmark (<dataset>) (<project>)

_get_run_number() int#

Get the run number for this optimization execution (cached).

get_trial_project_name(trial_number: int) str#

Return the per-trial OTEL project name and pre-create it as an experiment.

Called by the parameter/prompt optimizer BEFORE the eval run starts. Pre-creates the project with reference_dataset_id so OTEL traces land in an experiment project (visible in Datasets & Experiments UI).

_create_dataset_with_examples(
items: list[tuple[str, str, str]],
) None#

Create the LangSmith dataset and populate it with examples.

Args:

items: List of (item_id, question, expected) tuples.

_ensure_dataset(eval_result: Any) None#

Create the dataset for this optimization run (once).

pre_create_experiment(dataset_items: list) None#

Create the dataset upfront (before any trials run).

Must be called BEFORE get_trial_project_name() so the dataset exists when per-trial projects are pre-created with reference_dataset_id. Accepts list[EvalInputItem] from the eval framework.

_LS_SAFETY_MULTIPLIER: float = 3.0#
_LS_MIN_RETRIES: int = 10#
_LS_MAX_RETRIES: int = 60#
_LS_WARN_ITEM_THRESHOLD: int = 5000#
classmethod _estimate_retry_budget(expected_count: int) tuple[int, float]#

Estimate the retry budget for OTEL run linking based on dataset size.

Uses the shared indexing constants from langsmith_evaluation_callback (pipeline latency, throughput, retry delay) with a safety multiplier to scale the retry window proportionally.

Formula:

indexing_time = pipeline_latency + (expected_count / throughput)
total_budget  = indexing_time × safety_multiplier
max_retries   = clamp(total_budget / retry_delay, min=10, max=60)

Items

Indexing Est.

×3 Safety

Max Retries

Total Budget

5

10.5 s

31.5 s

10 (floor)

100 s

150

25.0 s

75.0 s

10 (floor)

100 s

600

70.0 s

210.0 s

21

210 s

5 000

510.0 s

1 530.0 s

60 (cap)

600 s

Warning

Datasets above 5 000 items per trial may exceed the maximum retry window (600 s). Some runs may not be linked in the LangSmith UI, although all traces will have been delivered.

Returns:

(max_retries, retry_delay) tuple for _match_and_link_otel_runs.

Link OTEL runs in the trial’s project to dataset examples and attach feedback.

static _format_params(parameters: dict[str, Any]) dict[str, Any]#

Sanitize parameter names (dots->underscores) and round floats.

static _humanize_param_name(param_name: str) str#

Convert ‘functions.email_phishing_analyzer.prompt’ to ‘Email Phishing Analyzer Prompt’.

_get_prompt_repo_name(param_name: str) str#

Get or create a unique prompt repo name for this optimization run.

Format: <project>-<param>-run-<N> e.g. aiq-shallow-researcher-full-optimization-system-prompt-run-1

VALID_TEMPLATE_FORMATS#
_JINJA2_MARKERS = ('{%', '{#')#
_JINJA2_EXPR_KEYWORDS = ('| ', ' if ', ' else ', ' for ')#
_MUSTACHE_MARKERS = ('{{#', '{{/', '{{>', '{{^')#
classmethod _detect_template_format(text: str) str#

Auto-detect template format from prompt content.

Detection priority (first match wins):
  1. Jinja2 block/comment tags ({%, {#) → "jinja2"

  2. Mustache section markers ({{#, {{/, {{>, {{^) → "mustache"

  3. Jinja2 expression keywords inside {{ }} (pipes, conditionals, loops) → "jinja2"

  4. Plain {{ }} without keywords → "jinja2" (ambiguous with mustache, but Jinja2 is far more common in Python/LangChain prompts)

  5. No curly-brace templating detected → "f-string"

Used as a fallback when SearchSpace.prompt_format is not explicitly set.

classmethod _validate_template_format(fmt: str) str#

Validate that a template format string is supported.

Raises ValueError with the list of valid options if not.

_resolve_template_format(
param_name: str,
prompt_text: str,
result: Any,
) str#

Resolve the LangChain template_format for a prompt.

Priority:
  1. Explicit prompt_formats from TrialResult (set via SearchSpace.prompt_format)

  2. Auto-detection from prompt content

Supported values: "f-string", "jinja2", "mustache".

_push_prompt(
result: Any,
commit_tags: list[str] | None = None,
) dict[str, str]#

Push a trial’s prompts to LangSmith with full metadata.

on_trial_end(
result: nat.profiler.parameter_optimization.optimizer_callbacks.TrialResult,
) None#
on_study_end(
*,
best_trial: nat.profiler.parameter_optimization.optimizer_callbacks.TrialResult,
total_trials: int,
) None#