NVIDIA NeMo Curator on DGX Cloud# Overview Introduction Curation Pipeline Overview Release Notes 2.0 1.0 Getting Started Creating a Dataset Dataset Guidelines Adding a Dataset Managing Datasets Running the Semantic Deduplication Pipeline Deduplication Pipeline Configuration Options Running Semantic Deduplication Pipeline Using S3 Input/Output Required Arguments Threshold Arguments Optional Arguments Invoking Semantic Deduplication Pipeline AWS Credential Parameters Pipeline Argument Parameters Running Semantic Deduplication Pipeline Using ZIP Upload Semantic Deduplication Pipeline Output Reference Curation Parameters Curated Dataset Structure API Reference Prerequisites Example Workflows Uploading a ZIP File Linking S3 Input/Output Buckets API Endpoints Create a Dataset Get a Dataset Get All Datasets by Organization Initialize Dataset Upload Get Presigned URLs Finalize Dataset Upload Get Dataset Download URL Process Dataset Delete Dataset Process S3 Input/Output Get Dataset Captions Update Dataset Captions Terminate All Jobs