About Building Multiple-Choice Question Benchmarks#
This section describes how to build a custom multiple-choice question (MCQ) benchmark as Apache Parquet files with the nemotron steps run byob/mcq command.
You supply domain text files under input_dir, and the pipeline samples few-shot exemplars from a Hugging Face benchmark named in your configuration, such as cais/mmlu.
The configuration specifies subject filters such as high_school_mathematics.
The benchmark step prepares seed rows, generates and judges questions, runs optional deduplication and distractor stages, and writes benchmark.parquet.
An optional translation stage reads an existing benchmark and writes another benchmark.parquet with the same column layout.
Tip
New to this flow? Follow Getting Started with Building MCQ Benchmarks once, then use the grids and tables below to jump to how-to guides, concepts, or reference pages.
When to Use#
The nemotron steps run byob/mcq command enables the following outcomes.
Questions grounded in your own documents, paired with few-shot items from a public benchmark subject you declare in configuration.
A repeatable Parquet artifact, one experiment folder under your configured
output_dir, plus intermediate caches when you iterate.Optional translation with forward passes, backtranslation, and metric thresholds before you export another Parquet benchmark.
Pipeline Summary#
At a high level, the benchmark step performs the following work.
Prepare: sample few-shot examples and align them with chunks from your corpus into a seed dataset.
Generate: run the staged MCQ pipeline from generation through filtering into
benchmark_raw.parquetandbenchmark.parquet.Translate, optional: translate questions and options, score backtranslation quality, and export a new
benchmark.parquet.
Documentation Series#
Install the byob extra, run the sample tiny configuration with local paths, and inspect Parquet outputs.
The tiny fixture pairs cais/mmlu high school mathematics few-shots with a one-line input file related to algebra.
Prepare data, tune models in YAML, customize prompts, and resume with skip_until.
How prepare, generate, and translate stages fit together and what each configuration block does.
Supported Hugging Face datasets, Parquet outputs, and YAML fields.
All Documentation#
Guide |
What you will do |
|---|---|
Run |
Guide |
What you will do |
|---|---|
Lay out |
|
Lay out per-target |
|
Point generation, judgement, and filter models at your endpoints |
|
Override prompts with a YAML file |
|
Resume after intermediate Parquet caches |
Guide |
What you will learn |
|---|---|
Stage order for prepare, generate, and translate |
|
Seeds, chunking, and the prepare step |
|
|
|
Data Designer batched generation |
|
Judgement, deduplication, distractors, coverage, outliers |
|
Easiness and hallucination filters |
|
Curator translation and backtranslation metrics |
Guide |
What you will find |
|---|---|
Paths under |
|
Symptom-to-fix index for BYOB runs |
|
Allowed |
|
Generation YAML keys |
|
Translation YAML keys |
What You Need#
A Nemotron clone with dependencies installed, including the
byobextra fromuv sync --extra byob.Model credentials and endpoints that match the
generation_model_config,judge_model_config, and related blocks in your YAML, as described in Configure Model Endpoints for BYOB.Network access to download the configured Hugging Face benchmark split unless it is already cached on disk.
Quick Start#
Follow Getting Started with Building MCQ Benchmarks if you have not run the step yet.
Read Prepare Your Own Domain Data when you are ready to point the pipeline at your own corpus and mapping.
Open Generation Configuration Reference or Translation Configuration Reference when you need field-level YAML detail.
Limitations and Considerations#
Cost: generation, judgement, expansion, validity checks, and filters call remote models whenever you configure them to do so.
Time: full runs depend on corpus size, model latency, and which optional stages stay enabled.
Rate limits: hosted APIs may throttle parallel requests that you set under
inference_parameters.Curator mount: checked-in configurations mount NeMo Curator from Git for translation and deduplication-related paths, so remote profiles must expose that tree the same way your environment expects.