Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Reference#

NeMo Curator on Kubernetes

Demonstration of how to run the NeMo Curator on a Dask Cluster deployed on top of Kubernetes

NeMo Curator and Apache Spark

Demonstration of how to read and write datasets when using Apache Spark and NeMo Curator

Best Practices

A collection of suggestions on how to best use NeMo Curator to curate your dataset

Next Steps

Now that you’ve curated your data, let’s discuss where to go next in the NeMo Framework to put it to good use.

Tutorials

To get started, you can explore the NeMo Curator GitHub repository and follow the available tutorials and notebooks. These resources cover various aspects of data curation, including training from scratch and Parameter-Efficient Fine-Tuning (PEFT).

API Docs

API Documentation for all the modules in NeMo Curator