Translation Configuration Reference#

This page describes the YAML configuration file that you provide for the translate stage: the keys you set or override, how backtranslation quality scores show up in Parquet output columns, and which rows are omitted from the final benchmark.parquet when quality filtering is enabled.

Required Keys#

Key

Notes

expt_name

Directory name under output_dir for caches and final artifacts.

dataset_path

Existing benchmark.parquet from a generation run.

output_dir

Parent directory for expt_name.

source_language

BCP-47 style tag (for example en-US).

target_language

Target locale tag (for example hi-IN).

translation_model_config

Dictionary with backend_type, params, and optional stage and segment_stage.

backtranslation_quality_metrics

Non-empty list; each element is a dictionary with type and threshold.

Quality Metrics#

Each type must be sacrebleu, chrf, or ter. Each threshold must be nonnegative.

NeMo Curator writes one numeric score column and one boolean pass column per configured metric, for example score_chrf and score_chrf_passed. The column is_quality_metric_passed is true on a row when every per-metric pass column is true for that row.

Each score compares the original benchmark text with the round-trip backtranslation from the target locale, using sentence-level APIs from the sacrebleu library.

type

Measures

Scale and direction

How to interpret scores

Row passes when

sacrebleu

Sentence bilingual evaluation understudy (BLEU)

0 through 100 after sacrebleu tokenization; higher values track closer matches to the reference.

High scores mean the backtranslation preserved wording and order; scores near zero mean little n-gram overlap.

score_sacrebleuthreshold

chrf

Sentence character n-gram F-score (chrF)

0 through 100 in typical sentence outputs; higher values mean closer character-level match.

High scores track spelling and phrasing fidelity; low scores mean the backtranslation diverged on the surface string.

score_chrfthreshold

ter

Sentence translation error rate (TER)

Zero means no edits; larger values report more insertions, deletions, or substitutions relative to the reference.

Values close to zero mean the backtranslation needed minimal editing to match the original; large values signal heavy rewrites or mismatch.

score_terthreshold

Inspect stage_cache/quality_metrics.parquet under your experiment directory to pick thresholds from the score spread you see in data.

Optional Keys#

Key

Default

Notes

remove_low_quality

True unless YAML overrides it

When true, the pipeline omits rows where is_quality_metric_passed is false before export.

FAITH evaluation#

translation_model_config.stage.enable_faith_eval must not be true. The benchmark translation relies on backtranslation metrics instead of FAITH filtering