Datasets#

Use datasets in your customization and evautions jobs for a model.

Prerequisites#

Before creating and managing datasets, make sure that you have:

  • The NeMo Entity Store and NeMo Data Store microservices are running on your cluster.

  • The base URLs for the NeMo Entity Store and NeMo Data Store microservices. This depends on how your cluster administrator configures the ingress setup for the NeMo microservices in your cluster. For more information, see Beginner Tutorial Prerequisites and Ingress Setup for Production Environment.

  • Hugging Face CLI or SDK installed.

Dataset Registry Workflow#

  1. Prepare datasets.

  2. Create a repo_id using {namespace}/{dataset_name}.

  3. Register the dataset in NeMo Entity Store.

  4. Upload dataset files to NeMo Data Store using the Hugging Face CLI or SDK with the repo_id.


Task Guides#

Create Dataset

Create a dataset and register it to NeMo Entity Store.

Create Dataset
Get Dataset

Get a dataset from your datastore.

Get Dataset
List Datasets

List datasets available for use in customization jobs.

List Datasets
Update Dataset

Update a dataset’s file or metadata.

Update Dataset
Delete Dataset

Delete a dataset.

Delete Dataset