Text Curation Tutorials
Hands-on tutorials for text curation workflows are available in the tutorials/text directory of the NeMo Curator GitHub repository.
Key Concepts for Tutorial Success
Before diving into the tutorials, familiarize yourself with these essential NeMo Curator concepts:
Pipeline Architecture
Core processing stages and pipeline concepts for text curation workflows data-structures distributed
Quality Assessment
Scoring and filtering techniques used in tutorials heuristics classifiers
Data Loading
Loading data from various sources common-crawl custom-data
Distributed Classification
GPU-accelerated classification concepts gpu scalable