Getting Started with Building MCQ Benchmarks#

What You’ll Build: A small multiple-choice question (MCQ) benchmark for math questions from the sample tiny configuration. You’ll run the nemotron steps run byob/mcq command and it will use an NVIDIA-hosted model endpoint for inference.

In this tutorial, you will:

  1. Install Python dependencies.

  2. Run the tiny configuration from the repository root.

  3. Locate outputs and scan the main Parquet artifacts.

  4. Confirm benchmark.parquet columns against the output reference.

This tutorial requires between 10 and 15 minutes to complete.

Sample Prompt

Help me create an MCQ benchmark using the tiny configuration from the Nemotron repository clone, write outputs under ./output, then show me which Parquet files to open first.

Start Here#

  • Run all commands from the repository root so the input_dir path in the procedure resolves.

  • The sample configuration uses the cais/mmlu dataset from Hugging Face for few-shot examples of multiple-choice questions.

  • The sample configuration uses the src/nemotron/steps/byob/data/tiny_input/maths/tiny.txt file for input.

    Algebra studies symbols and the rules for manipulating them. Linear equations can model simple relationships, such as converting between a starting value and a constant rate of change.
    

Prerequisites#

  • You have a host with access to https://integrate.api.nvidia.com.

  • The uv tool available in your shell.

  • NVIDIA_API_KEY exported in the same shell session before you run the procedure. The default model for the configuration is openai/gpt-oss-120b.

Procedure#

  1. Clone the repository:

    git clone https://github.com/NVIDIA-NeMo/Nemotron && cd Nemotron
    
  2. From the repository root, add the dependencies for building benchmarks:

    uv sync --extra byob
    
  3. Run generation with host paths. The tiny configuration sets stage: all. This stage setting chains the data preparation and then generation for MCQ.

    uv run nemotron steps run byob/mcq \
      -c tiny \
      family=mcq \
      stage=all \
      input_dir="./src/nemotron/steps/byob/data/tiny_input" \
      output_dir=./byob-output
    

    When the -c / --config argument is not a path, the command resolves the config file name in the src/nemotron/steps/byob/mcq/config/ directory.

    When the command finishes, list the ./byob-output/byob_mcq_tiny/ directory. The expt_name field in the src/nemotron/steps/byob/mcq/config/tiny.yaml file specifies that directory. Look for the following files:

    • stage_cache/*.parquet, one file per intermediate stage, described in Output Files

    • benchmark_raw.parquet, the full row set before optional removals

    • benchmark.parquet, the final mcq schema for downstream use

  4. Open benchmark.parquet with Pandas, Polars, or another Parquet-aware tool and confirm columns match Output Files.

Next Steps#

For background on how stages connect, read Pipeline Overview. For task-focused changes, continue with: