Getting Started with Building MCQ Benchmarks#
What You’ll Build: A small multiple-choice question (MCQ) benchmark for math questions from the sample tiny configuration.
You’ll run the nemotron steps run byob/mcq command and it will use an NVIDIA-hosted model endpoint for inference.
In this tutorial, you will:
Install Python dependencies.
Run the
tinyconfiguration from the repository root.Locate outputs and scan the main Parquet artifacts.
Confirm
benchmark.parquetcolumns against the output reference.
This tutorial requires between 10 and 15 minutes to complete.
Sample Prompt
Help me create an MCQ benchmark using the tiny configuration from the Nemotron repository clone, write outputs under ./output, then show me which Parquet files to open first.
Start Here#
Run all commands from the repository root so the
input_dirpath in the procedure resolves.The sample configuration uses the
cais/mmludataset from Hugging Face for few-shot examples of multiple-choice questions.The sample configuration uses the
src/nemotron/steps/byob/data/tiny_input/maths/tiny.txtfile for input.Algebra studies symbols and the rules for manipulating them. Linear equations can model simple relationships, such as converting between a starting value and a constant rate of change.
Prerequisites#
You have a host with access to https://integrate.api.nvidia.com.
The
uvtool available in your shell.NVIDIA_API_KEYexported in the same shell session before you run the procedure. The default model for the configuration isopenai/gpt-oss-120b.
Procedure#
Clone the repository:
git clone https://github.com/NVIDIA-NeMo/Nemotron && cd NemotronFrom the repository root, add the dependencies for building benchmarks:
uv sync --extra byobRun generation with host paths. The
tinyconfiguration setsstage: all. This stage setting chains the data preparation and then generation for MCQ.uv run nemotron steps run byob/mcq \ -c tiny \ family=mcq \ stage=all \ input_dir="./src/nemotron/steps/byob/data/tiny_input" \ output_dir=./byob-output
When the
-c/--configargument is not a path, the command resolves the config file name in thesrc/nemotron/steps/byob/mcq/config/directory.When the command finishes, list the
./byob-output/byob_mcq_tiny/directory. Theexpt_namefield in thesrc/nemotron/steps/byob/mcq/config/tiny.yamlfile specifies that directory. Look for the following files:stage_cache/*.parquet, one file per intermediate stage, described in Output Filesbenchmark_raw.parquet, the full row set before optional removalsbenchmark.parquet, the finalmcqschema for downstream use
Open
benchmark.parquetwith Pandas, Polars, or another Parquet-aware tool and confirm columns match Output Files.
Next Steps#
For background on how stages connect, read Pipeline Overview. For task-focused changes, continue with:
Swap in your own corpus and mapping: Prepare Your Own Domain Data.
Tune endpoints and keys: Configure Model Endpoints for BYOB.