Configure Fields and Output#
Use this guide when you need to align text_field, output_mode, reconstruction flags, and formats with your dataset schema before or after a first run.
For a guided first invocation, see Getting Started With Translation.
Choosing text_field#
Chat corpora in OpenAI layout typically use
messages.*.contentso every messagecontententry is translated consistently.Plain documents might use a single column such as
article_body. Omit wildcards when the schema is flat.
Outputs#
YAML key |
Behavior |
|---|---|
|
Control column names used when emitting translated strings. The default is |
|
|
|
Keeps FAITH outputs adjacent to translations when scoring runs. |
|
Enable faithful reconstructions of chat arrays. These default to |
Formats#
Set input_format when automatic probing cannot distinguish ambiguous globs. Align output_format with downstream packing expectations; values are jsonl or parquet.
CLI Overrides#
You can override any YAML key with dotlists:
uv run nemotron steps run translate/nemo_curator -c default \
text_field=messages.*.content \
output_mode=both \
reconstruct_messages=true \
input_path=/path/to/chat.jsonl \
output_dir=/path/to/out \
source_language=en \
target_language=fr