Translation#
Learn how to take an existing benchmark.parquet from generation, translate it to a target locale, score quality with backtranslation metrics, and export another benchmark.parquet.
The field names, defaults, and validation rules are listed in Translation Configuration Reference. Artifact paths are summarized in Output Files.
What You Configure#
Control |
What you set |
|---|---|
|
Absolute or workspace path to the source |
|
Where caches and the translated |
|
BCP-47 style tags, for example |
|
Curator experimental translation block: |
|
List of |
|
When |
Do not set translation_model_config.stage.enable_faith_eval to true.
Translation relies on backtranslation metrics instead of FAITH.
Running the Translate Stage#
Pass stage=translate unless your YAML sets a top-level stage key.
The CLI requires an explicit stage when that key is absent.
uv run nemotron steps run byob/mcq -c translate stage=translate
Tune Quality Gates#
The backtranslation_quality_metrics field is the only place you define automatic pass or fail rules for backtranslation checks.
Add or remove list entries to change which scores are computed, and adjust threshold values when you want stricter or looser gates.
After a run, open quality_metrics.parquet under output_dir/expt_name/stage_cache/ to read per-metric score columns and is_quality_metric_passed before you change YAML again.
Final Filtering Control#
remove_low_qualitydecides whether failing rows disappear from the exportedbenchmark.parquet.remove_low_quality: true # omit to get the same default
remove_low_quality: false # keep failing rows; filter manually using Parquet columns
Reference Layout#
The sample src/nemotron/steps/byob/mcq/config/translate.yaml file shows a complete translation_model_config with backend_type: llm, NVIDIA provider parameters, and stage / segment_stage tuning.
Copy that structure, then swap model IDs, concurrency, and language tags for your workload.
The YAML below mirrors the sample configu, including remove_low_quality: false so rows that fail the aggregate quality gate remain in benchmark.parquet and you can inspect stage_cache/quality_metrics.parquet while you tune thresholds.
When you omit remove_low_quality or set it to true, failing rows are dropped before export.
expt_name: byob_mcq_translation
dataset_path: /path/to/benchmark.parquet
output_dir: /path/to/outputs
source_language: en-US
target_language: hi-IN
translation_model_config:
backend_type: llm
params:
alias: gpt-oss-120b
model: openai/gpt-oss-120b
provider: nvidia
api_key_env: NGC_API_KEY
inference_parameters:
max_tokens: 16000
max_parallel_requests: 8
temperature: 0.0
top_p: 0.95
stage:
segmentation_mode: coarse
min_segment_chars: 0
output_mode: both
segment_stage:
health_check: true
max_concurrent_requests: 8
backtranslation_quality_metrics:
- type: sacrebleu
threshold: 25
- type: chrf
threshold: 50
- type: ter
threshold: 50
remove_low_quality: false
Directory Structure#
The translation stage writes intermediate Parquet files to <output_dir>/<expt_name>/<stage_cache> as translated_questions.parquet, backtranslated_questions.parquet, and quality_metrics.parquet, followed by benchmark_raw.parquet and the renamed benchmark.parquet in the experiment root.
Use the intermediate files to debug language mix-ups, threshold misses, or model refusals before you change configuration again.