nemo_curator.stages.text.experimental.translation.evaluation.faith
nemo_curator.stages.text.experimental.translation.evaluation.faith
FAITH-based translation quality scoring and optional filtering.
Module Contents
Classes
Functions
Data
API
Bases: ProcessingStage[DocumentBatch, DocumentBatch]
LLM-based translation quality scorer using the FAITH metric.
For each row in the incoming DocumentBatch, this stage:
- Formats a FAITH evaluation prompt with source and translated text.
- Calls the LLM via
AsyncLLMClientto obtain a JSON score response. - Parses the response for 5 FAITH dimension scores.
- Computes
faith_avg(mean of the 5 scores). - Optionally drops rows where
faith_avg < threshold(whenfilter_enabled=True).
Parameters
client : AsyncLLMClient | None
Async LLM client for scoring. Must not be None.
model_name : str
LLM model identifier to use for scoring.
source_lang : str
ISO 639-1 code of the source language (e.g. "en").
target_lang : str
ISO 639-1 code of the target language (e.g. "zh").
source_text_field : str
Column name containing the original source text.
translated_text_field : str
Column name containing the translated text.
threshold : float
Minimum faith_avg score to keep a row. Rows below this are dropped
(only when filter_enabled=True).
filter_enabled : bool
When True (default), rows with faith_avg < threshold are dropped.
When False, all rows are kept with their scores attached, enabling
downstream score analysis before committing to a threshold.
generation_config : GenerationConfig | None
LLM generation parameters. Defaults to temperature=0.0, max_tokens=256.
Write parsed FAITH scores back onto the DataFrame.
Build the chat messages for a single FAITH evaluation request.
Compute faith_avg as the mean of non-zero per-dimension scores.
Follows the “zero means not applicable” convention: dimensions
scored as 0.0 are excluded from the average. If every
dimension is zero, returns 0.0.
Parameters
scores : dict
Dict keyed by :data:FAITH_KEYS (missing keys treated as 0).
Find and return the first balanced {...} JSON object in text.
Walks the string counting {/} pairs, respecting string
literals so that braces inside quoted strings do not affect the
balance and do not anchor the scan. For example, in
'message: "{pre}" scores: {"Fluency": 4}' the first { lives
inside a string literal and must be ignored; the real object starts
at the second {.
Supports nested objects (e.g. {"scores": {"Fluency": 4, ...}}).
Returns: str | None
Substring from the first real { to its matching }
Extract FAITH scores from an LLM JSON response.
Finds the first balanced {...} block in text (with support for
nested objects), parses it as JSON, and normalises the keys to the
five FAITH dimensions. Missing keys default to 0.0.
A score of 0.0 follows the “zero means not applicable” convention
(see :meth:_average_scores).
Returns: dict
Tuple of (scores, parse_failed) where scores is a dict
Apply threshold filtering while preserving parse-failed rows.
Log aggregate FAITH scores and parse-failure counts.
Score all rows using the async LLM client.
Handles event-loop edge cases (e.g. being called from within an existing async context such as a Ray async actor).
Issue concurrent LLM requests for every row.
Uses return_exceptions=True so that individual scoring failures
do not abort the entire batch. Failed rows receive an empty string
response, and the error is logged.
Run FAITH scoring for each row in the batch.
Score each translation row and filter rows below threshold.
Initialize the LLM client and load prompt templates.
Prompt YAML loading and default generation config are deferred here
(instead of __post_init__) for Ray compatibility: __post_init__
runs on the driver, while setup() runs on the worker.
Bases: ProcessingStage[DocumentBatch, DocumentBatch]
Filter document rows using precomputed FAITH scores.
Drop rows below the FAITH threshold while preserving parse failures.
Return the balanced object end index starting at start, or -1.
Return the first { outside a JSON string, or -1.
Return a DataFrame safe to mutate in-place for stage-local work.
Return updated JSON string state and whether ch was consumed by it.