Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Reference#
- NeMo Curator on Kubernetes
Demonstration of how to run the NeMo Curator on a Dask Cluster deployed on top of Kubernetes
- NeMo Curator and Apache Spark
Demonstration of how to read and write datasets when using Apache Spark and NeMo Curator
- Best Practices
A collection of suggestions on how to best use NeMo Curator to curate your dataset
- Next Steps
Now that you’ve curated your data, let’s discuss where to go next in the NeMo Framework to put it to good use.
- Tutorials
To get started, you can explore the NeMo Curator GitHub repository and follow the available tutorials and notebooks. These resources cover various aspects of data curation, including training from scratch and Parameter-Efficient Fine-Tuning (PEFT).
- API Docs
API Documentation for all the modules in NeMo Curator