Translation Configuration Reference#
This page describes the YAML configuration file that you provide for the translate stage: the keys you set or override, how backtranslation quality scores show up in Parquet output columns, and which rows are omitted from the final benchmark.parquet when quality filtering is enabled.
Required Keys#
Key |
Notes |
|---|---|
|
Directory name under |
|
Existing |
|
Parent directory for |
|
BCP-47 style tag (for example |
|
Target locale tag (for example |
|
Dictionary with |
|
Non-empty list; each element is a dictionary with |
Quality Metrics#
Each type must be sacrebleu, chrf, or ter.
Each threshold must be nonnegative.
NeMo Curator writes one numeric score column and one boolean pass column per configured metric, for example score_chrf and score_chrf_passed.
The column is_quality_metric_passed is true on a row when every per-metric pass column is true for that row.
Each score compares the original benchmark text with the round-trip backtranslation from the target locale, using sentence-level APIs from the sacrebleu library.
|
Measures |
Scale and direction |
How to interpret scores |
Row passes when |
|---|---|---|---|---|
|
Sentence bilingual evaluation understudy (BLEU) |
0 through 100 after sacrebleu tokenization; higher values track closer matches to the reference. |
High scores mean the backtranslation preserved wording and order; scores near zero mean little n-gram overlap. |
|
|
Sentence character n-gram F-score (chrF) |
0 through 100 in typical sentence outputs; higher values mean closer character-level match. |
High scores track spelling and phrasing fidelity; low scores mean the backtranslation diverged on the surface string. |
|
|
Sentence translation error rate (TER) |
Zero means no edits; larger values report more insertions, deletions, or substitutions relative to the reference. |
Values close to zero mean the backtranslation needed minimal editing to match the original; large values signal heavy rewrites or mismatch. |
|
Inspect stage_cache/quality_metrics.parquet under your experiment directory to pick thresholds from the score spread you see in data.
Optional Keys#
Key |
Default |
Notes |
|---|---|---|
|
|
When true, the pipeline omits rows where |
FAITH evaluation#
translation_model_config.stage.enable_faith_eval must not be true.
The benchmark translation relies on backtranslation metrics instead of FAITH filtering