Delete Dataset#

You can delete a dataset if you no longer need it through the NeMo Entity Store and NeMo Data Store microservices.

Prerequisites#

Before you can delete a dataset, make sure that you have:

  • Obtained the base URL of your NeMo Entity Store Microservice.

  • Permissions to access the NeMo Entity Store microservice endpoint.

  • Permissions to access the NeMo Data Store service URL.

  • Obtained the namespace and dataset_name of the dataset you want to delete.


How to Delete a Dataset#

Deleting a dataset programmatically requires two steps: deleting and unregistration.

Hugging Face#

Delete dataset#

You can use both the Hugging Face API and SDK to delete the dataset repo. (The CLI does not currently support deletion.)

# Set the namespace and dataset name
NAMESPACE=<your_namespace>
DATASET_NAME=<your_dataset_name>
DATA_STORE_BASE_URL=<URL for NeMo Data Store>

# Delete the dataset
curl -X 'DELETE' "${DATA_STORE_BASE_URL}/v1/hf/api/repos/delete" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
    "name": "${DATASET_NAME}",
    "organization": "${NAMESPACE}",
    "type": "dataset"
}'

Verify that the dataset is no longer present:

# Set the namespace and dataset name
NAMESPACE=<your_namespace>
DATASET_NAME=<your_dataset_name>
DATA_STORE_BASE_URL=<URL for NeMo Data Store>

# Try to retrieve the dataset
curl -X 'GET' \
"${DATA_STORE_BASE_URL}/v1/hf/api/datasets/$#{NAMESPACE}/${DATASET_NAME}"
from huggingface_hub import HfApi
import os

# Configure microservice host URLs
DATA_STORE_BASE_URL = os.getenv("DATA_STORE_BASE_URL")

# Define entity details
NAMESPACE = os.getenv("NAMESPACE", "default")
DATASET_NAME = "<your_dataset_name>"

# Provide HF token
HF_TOKEN = os.getenv("HF_TOKEN")

try:
   # Initialize Hugging Face API client
   # Note: A valid token is required for most operations
   hf_api = HfApi(endpoint=f"{DATA_STORE_BASE_URL}/v1/hf", token=HF_TOKEN)

   # Set the dataset repository details
   repo_id = f"{NAMESPACE}/{DATASET_NAME}"

   # Delete the dataset
   hf_api.delete_repo(
      repo_type="dataset",
      repo_id=repo_id,
   )
   print(f"Successfully deleted dataset {repo_id}")

except Exception as e:
   print(f"Error deleting dataset: {str(e)}")
   raise

Unregister Dataset#

Remove the dataset from NeMo Entity Store so that other microservices can discover that the deletion has taken place.

  1. Make a DELETE request to the /v1/datasets/{namespace}/{dataset_name} endpoint. The following example deletes the documentation-test-dataset dataset.

    export ENTITY_STORE_BASE_URL=<URL for NeMo Entity Store>
    export NAMESPACE="team-docs"
    export DATASET_NAME="documentation-test-dataset"
    
    curl -X DELETE "${ENTITY_STORE_BASE_URL}/v1/datasets/${NAMESPACE}/${DATASET_NAME}" \
        -H 'Accept: application/json' \
        -H 'Content-Type: application/json' | jq
    
  2. Verify that the dataset was deleted by reviewing the response.

    Example Response
    {
      "message": "Resource deleted successfully.",
      "id": "dataset-81RSQp7FKX3rdBtKvF9Skn",
      "deleted_at": null
    }