Bringing Your Own Dataset#

This guide explains how to import your own dataset to be used with NeMo Automodel.

Types of Supported Datasets#

NeMo Automodel supports several types of datasets for different training scenarios:

  • Completion datasets: Single text sequences for language modeling

  • Conversation datasets: Multi-turn chat dialogues

  • Instruction datasets: Question-answer pairs

  • Multi-modal datasets: Text combined with images and audio

TODO: onboard CPT and multi-turn SFT. Add documentation for all of the above.