nat.plugins.langchain.eval.langsmith_judge#

Attributes#

logger

Classes#

LangSmithJudgeConfig

LLM-as-judge evaluator powered by openevals.

Functions#

`_resolve_prompt`(→ str)	Resolve a prompt name to the actual prompt string.
`_build_create_kwargs`(→ dict[str, Any])	Assemble keyword arguments for `openevals.create_async_llm_as_judge`.
`register_langsmith_judge`(config, builder)	Register an LLM-as-judge evaluator with NAT.

Module Contents#

logger#

_resolve_prompt(prompt_value: str) → str#

Resolve a prompt name to the actual prompt string.

Prompt names are resolved dynamically by convention: the short name is uppercased and suffixed with _PROMPT to form the constant name in openevals.prompts (e.g., 'correctness' -> CORRECTNESS_PROMPT).

If the name doesn’t match a constant in openevals.prompts, it is treated as a literal prompt template string (e.g., a custom f-string).

Args:: prompt_value: A short prompt name (e.g., 'correctness') or a literal prompt template string.
Returns:: The resolved prompt string.

class LangSmithJudgeConfig(/, **data: Any)#

Bases: nat.data_models.evaluator.EvaluatorBaseConfig, nat.data_models.retry_mixin.RetryMixin, nat.plugins.langchain.eval.langsmith_evaluator.LangSmithExtraFieldsMixin

LLM-as-judge evaluator powered by openevals.

Uses a prebuilt or custom prompt with a judge LLM to score workflow outputs. Prebuilt prompt names (e.g., 'correctness', 'hallucination') are resolved from openevals automatically.

Common create_async_llm_as_judge parameters are exposed as typed fields for discoverability and validation. Any additional / future parameters can be forwarded via the judge_kwargs pass-through dict.

Important: The judge LLM must support structured output (JSON schema mode via with_structured_output). Models that do not support structured output will produce parsing errors and zero scores. Verify that your chosen model supports this capability before use.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

prompt: str = None#

llm_name: nat.data_models.component_ref.LLMRef = None#

feedback_key: str = None#

continuous: bool = None#

choices: list[float] | None = None#

use_reasoning: bool = None#

system: str | None = None#

few_shot_examples: list[dict[str, Any]] | None = None#

output_schema: str | None = None#

score_field: str = None#

judge_kwargs: dict[str, Any] | None = None#

_validate_scoring() → LangSmithJudgeConfig#

_build_create_kwargs( config: LangSmithJudgeConfig, resolved_prompt: str, judge_llm: Any, ) → dict[str, Any]#

Assemble keyword arguments for openevals.create_async_llm_as_judge.

Typed config fields are added first, then optional fields are merged only when set. Finally, judge_kwargs is merged with overlap detection so that users cannot accidentally shadow typed fields.

Args:: config: The judge evaluator configuration. resolved_prompt: The prompt string, already resolved from a short name or left as-is for custom templates. judge_llm: The LLM instance to use as the judge.
Returns:: Dictionary of keyword arguments ready for create_async_llm_as_judge.
Raises:: ValueError: If judge_kwargs keys overlap with typed fields.

async register_langsmith_judge( config: LangSmithJudgeConfig, builder: nat.builder.builder.EvalBuilder, )#: Register an LLM-as-judge evaluator with NAT.