nat.plugins.langchain.langsmith.langsmith_optimization_callback#

Attributes#

logger

Classes#

LangSmithOptimizationCallback

Per-trial experiment projects with OTEL trace linking and prompt management.

Module Contents#

logger#

class LangSmithOptimizationCallback( *, project: str, experiment_prefix: str = 'NAT', dataset_name: str | None = None, )#

Per-trial experiment projects with OTEL trace linking and prompt management.

Each optimizer trial gets its own experiment project linked to a shared dataset. OTEL traces are routed to per-trial projects via get_trial_project_name(), which also pre-creates the project with reference_dataset_id. After eval, OTEL runs are retroactively linked to dataset examples with feedback and parameter metadata.

needs_root_span_ids = True#

_client#

_project#

_experiment_prefix = 'NAT'#

_dataset_name_hint = None#

_dataset_id: str | None = None#

_dataset_name: str | None = None#

_run_number: int | None = None#

_example_ids: dict[Any, str]#

_prompt_commit_urls: dict[tuple[str, int], str]#

_prompt_repo_names: dict[str, str]#

_prompt_trial_counter: int = 0#

_prompt_param_names: list[str] = []#

set_prompt_param_names(names: list[str]) → None#

_build_base_name() → str#

Build the base name used for datasets and run numbering.

Format: Optimization Benchmark (<dataset>) (<project>)

_get_run_number() → int#: Get the run number for this optimization execution (cached).

get_trial_project_name(trial_number: int) → str#

Return the per-trial OTEL project name and pre-create it as an experiment.

Called by the parameter/prompt optimizer BEFORE the eval run starts. Pre-creates the project with reference_dataset_id so OTEL traces land in an experiment project (visible in Datasets & Experiments UI).

_create_dataset_with_examples( items: list[tuple[str, str, str]], ) → None#

Create the LangSmith dataset and populate it with examples.

Args:: items: List of (item_id, question, expected) tuples.

_ensure_dataset(eval_result: Any) → None#: Create the dataset for this optimization run (once).

pre_create_experiment(dataset_items: list) → None#

Create the dataset upfront (before any trials run).

Must be called BEFORE get_trial_project_name() so the dataset exists when per-trial projects are pre-created with reference_dataset_id. Accepts list[EvalInputItem] from the eval framework.

_LS_SAFETY_MULTIPLIER: float = 3.0#

_LS_MIN_RETRIES: int = 10#

_LS_MAX_RETRIES: int = 60#

_LS_WARN_ITEM_THRESHOLD: int = 5000#

classmethod _estimate_retry_budget(expected_count: int) → tuple[int, float]#

Estimate the retry budget for OTEL run linking based on dataset size.

Uses the shared indexing constants from langsmith_evaluation_callback (pipeline latency, throughput, retry delay) with a safety multiplier to scale the retry window proportionally.

Formula:

indexing_time = pipeline_latency + (expected_count / throughput)
total_budget  = indexing_time × safety_multiplier
max_retries   = clamp(total_budget / retry_delay, min=10, max=60)

Items	Indexing Est.	×3 Safety	Max Retries	Total Budget
5	10.5 s	31.5 s	10 (floor)	100 s
150	25.0 s	75.0 s	10 (floor)	100 s
600	70.0 s	210.0 s	21	210 s
5 000	510.0 s	1 530.0 s	60 (cap)	600 s

Warning

Datasets above 5 000 items per trial may exceed the maximum retry window (600 s). Some runs may not be linked in the LangSmith UI, although all traces will have been delivered.

Returns:: (max_retries, retry_delay) tuple for _match_and_link_otel_runs.

_link_otel_runs( trial_number: int, eval_result: Any, parameters: dict[str, Any] | None = None, prompt_commit_tags: dict[str, str] | None = None, ) → None#: Link OTEL runs in the trial’s project to dataset examples and attach feedback.

static _format_params(parameters: dict[str, Any]) → dict[str, Any]#: Sanitize parameter names (dots->underscores) and round floats.

static _humanize_param_name(param_name: str) → str#: Convert ‘functions.email_phishing_analyzer.prompt’ to ‘Email Phishing Analyzer Prompt’.

_get_prompt_repo_name(param_name: str) → str#

Get or create a unique prompt repo name for this optimization run.

Format: <project>-<param>-run-<N> e.g. aiq-shallow-researcher-full-optimization-system-prompt-run-1

VALID_TEMPLATE_FORMATS#

_JINJA2_MARKERS = ('{%', '{#')#

_JINJA2_EXPR_KEYWORDS = ('| ', ' if ', ' else ', ' for ')#

_MUSTACHE_MARKERS = ('{{#', '{{/', '{{>', '{{^')#

classmethod _detect_template_format(text: str) → str#

Auto-detect template format from prompt content.

Detection priority (first match wins):

Jinja2 block/comment tags ({%, {#) → "jinja2"
Mustache section markers ({{#, {{/, {{>, {{^) → "mustache"
Jinja2 expression keywords inside {{ }} (pipes, conditionals, loops) → "jinja2"
Plain {{ }} without keywords → "jinja2" (ambiguous with mustache, but Jinja2 is far more common in Python/LangChain prompts)
No curly-brace templating detected → "f-string"

Used as a fallback when SearchSpace.prompt_format is not explicitly set.

classmethod _validate_template_format(fmt: str) → str#

Validate that a template format string is supported.

Raises ValueError with the list of valid options if not.

_resolve_template_format( param_name: str, prompt_text: str, result: Any, ) → str#

Resolve the LangChain template_format for a prompt.

Priority:

Explicit prompt_formats from TrialResult (set via SearchSpace.prompt_format)
Auto-detection from prompt content

Supported values: "f-string", "jinja2", "mustache".

_push_prompt( result: Any, commit_tags: list[str] | None = None, ) → dict[str, str]#: Push a trial’s prompts to LangSmith with full metadata.

on_trial_end( result: nat.profiler.parameter_optimization.optimizer_callbacks.TrialResult, ) → None#

on_study_end( *, best_trial: nat.profiler.parameter_optimization.optimizer_callbacks.TrialResult, total_trials: int, ) → None#