nat.plugins.langchain.eval.langsmith_judge#

Attributes#

Classes#

LangSmithJudgeConfig

LLM-as-judge evaluator powered by openevals.

Functions#

_resolve_prompt(→ str)

Resolve a prompt name to the actual prompt string.

_build_create_kwargs(→ dict[str, Any])

Assemble keyword arguments for openevals.create_async_llm_as_judge.

register_langsmith_judge(config, builder)

Register an LLM-as-judge evaluator with NAT.

Module Contents#

logger#
_resolve_prompt(prompt_value: str) str#

Resolve a prompt name to the actual prompt string.

Prompt names are resolved dynamically by convention: the short name is uppercased and suffixed with _PROMPT to form the constant name in openevals.prompts (e.g., 'correctness' -> CORRECTNESS_PROMPT).

If the name doesn’t match a constant in openevals.prompts, it is treated as a literal prompt template string (e.g., a custom f-string).

Args:

prompt_value: A short prompt name (e.g., 'correctness') or a literal prompt template string.

Returns:

The resolved prompt string.

class LangSmithJudgeConfig#

Bases: nat.data_models.evaluator.EvaluatorBaseConfig, nat.data_models.retry_mixin.RetryMixin, nat.plugins.langchain.eval.langsmith_evaluator.LangSmithExtraFieldsMixin

LLM-as-judge evaluator powered by openevals.

Uses a prebuilt or custom prompt with a judge LLM to score workflow outputs. Prebuilt prompt names (e.g., 'correctness', 'hallucination') are resolved from openevals automatically.

Common create_async_llm_as_judge parameters are exposed as typed fields for discoverability and validation. Any additional / future parameters can be forwarded via the judge_kwargs pass-through dict.

Important: The judge LLM must support structured output (JSON schema mode via with_structured_output). Models that do not support structured output will produce parsing errors and zero scores. Verify that your chosen model supports this capability before use.

prompt: str = None#
llm_name: nat.data_models.component_ref.LLMRef = None#
feedback_key: str = None#
continuous: bool = None#
choices: list[float] | None = None#
use_reasoning: bool = None#
system: str | None = None#
few_shot_examples: list[dict[str, Any]] | None = None#
output_schema: str | None = None#
score_field: str = None#
judge_kwargs: dict[str, Any] | None = None#
_validate_scoring() LangSmithJudgeConfig#
_build_create_kwargs(
config: LangSmithJudgeConfig,
resolved_prompt: str,
judge_llm: Any,
) dict[str, Any]#

Assemble keyword arguments for openevals.create_async_llm_as_judge.

Typed config fields are added first, then optional fields are merged only when set. Finally, judge_kwargs is merged with overlap detection so that users cannot accidentally shadow typed fields.

Args:

config: The judge evaluator configuration. resolved_prompt: The prompt string, already resolved from a short name or left as-is for custom templates. judge_llm: The LLM instance to use as the judge.

Returns:

Dictionary of keyword arguments ready for create_async_llm_as_judge.

Raises:

ValueError: If judge_kwargs keys overlap with typed fields.

async register_langsmith_judge(
config: LangSmithJudgeConfig,
builder: nat.builder.builder.EvalBuilder,
)#

Register an LLM-as-judge evaluator with NAT.