Use Fine Segmentation#
Use this guide when segmentation_mode=coarse mishandles long prose and you want to try fine on a sample before scaling.
Conceptual background lives in Segmentation.
Procedure#
Reproduce an issue with
segmentation_mode=coarse, which is the default indefault.yaml.Retry on a slice with
segmentation_mode=fine:
uv run nemotron steps run translate/nemo_curator -c default \
segmentation_mode=fine \
input_path=/path/to/sample.jsonl \
output_dir=/path/to/out-fine \
source_language=en \
target_language=de \
server.model=YOUR_LLM_MODEL_ID
Compare reconstruction fidelity against throughput cost. Fine mode increases segment counts.
Pair With min_segment_chars#
Raise min_segment_chars modestly to skip trivial whitespace-only fragments if fine mode becomes noisy.