Skip to main content
Ctrl+K
NeMo Curator on DGX Cloud - Home

NeMo Curator on DGX Cloud

NeMo Curator on DGX Cloud - Home

NeMo Curator on DGX Cloud

Table of Contents

Overview

  • Introduction
  • Release Notes

Getting Started

  • Creating a Dataset
  • Running the Semantic Deduplication Pipeline

Reference

  • Curation Parameters
  • Curated Dataset Structure
  • API Reference

NVIDIA NeMo Curator on DGX Cloud#

Overview

  • Introduction
    • Curation Pipeline Overview
  • Release Notes
    • 2.0
    • 1.0

Getting Started

  • Creating a Dataset
    • Dataset Guidelines
    • Adding a Dataset
    • Managing Datasets
  • Running the Semantic Deduplication Pipeline
    • Deduplication Pipeline Configuration Options
    • Running Semantic Deduplication Pipeline Using S3 Input/Output
      • Required Arguments
      • Threshold Arguments
      • Optional Arguments
      • Invoking Semantic Deduplication Pipeline
        • AWS Credential Parameters
        • Pipeline Argument Parameters
    • Running Semantic Deduplication Pipeline Using ZIP Upload
    • Semantic Deduplication Pipeline Output

Reference

  • Curation Parameters
  • Curated Dataset Structure
  • API Reference
    • Prerequisites
    • Example Workflows
      • Uploading a ZIP File
      • Linking S3 Input/Output Buckets
    • API Endpoints
      • Create a Dataset
      • Get a Dataset
      • Get All Datasets by Organization
      • Initialize Dataset Upload
      • Get Presigned URLs
      • Finalize Dataset Upload
      • Get Dataset Download URL
      • Process Dataset
      • Delete Dataset
      • Process S3 Input/Output
      • Get Dataset Captions
      • Update Dataset Captions
      • Terminate All Jobs

next

Introduction

NVIDIA NVIDIA

Copyright © 2025, NVIDIA Corporation.

Last updated on Aug 28, 2025.