Delete Dataset#

Delete a dataset you don’t need anymore. You need to delete the dataset from NeMo Data Store and unregister it from NeMo Entity Store.

Prerequisites#

Before you can delete a dataset, make sure that you have:

  • Obtained the base URL of your NeMo Entity Store Microservice.

  • Permissions to access the NeMo Entity Store microservice endpoint.

  • Permissions to access the NeMo Data Store service URL.

  • Obtained the namespace and dataset_name of the dataset you want to delete.


To Delete a Dataset#

Deleting a dataset programmatically requires two steps: deleting and unregistering.

To Delete a Dataset Using the Hugging Face API#

You can use the Hugging Face (HF) API or SDK to delete the dataset repo.

Note

The HF CLI currently does not support deletion.

# Set the namespace and dataset name
NAMESPACE=<your_namespace>
DATASET_NAME=<your_dataset_name>
DATA_STORE_BASE_URL=<URL for NeMo Data Store>

# Delete the dataset
curl -X 'DELETE' "${DATA_STORE_BASE_URL}/v1/hf/api/repos/delete" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
    "name": "${DATASET_NAME}",
    "organization": "${NAMESPACE}",
    "type": "dataset"
}'

Verify that the dataset is no longer present:

# Set the namespace and dataset name
NAMESPACE=<your_namespace>
DATASET_NAME=<your_dataset_name>
DATA_STORE_BASE_URL=<URL for NeMo Data Store>

# Try to retrieve the dataset
curl -X 'GET' \
"${DATA_STORE_BASE_URL}/v1/hf/api/datasets/$#{NAMESPACE}/${DATASET_NAME}"
from huggingface_hub import HfApi
import os

# Configure microservice host URLs
DATA_STORE_BASE_URL = os.getenv("DATA_STORE_BASE_URL")

# Define entity details
NAMESPACE = os.getenv("NAMESPACE", "default")
DATASET_NAME = "<your_dataset_name>"

# Provide HF token
HF_TOKEN = os.getenv("HF_TOKEN")

try:
   # Initialize Hugging Face API client
   # Note: A valid token is required for most operations
   hf_api = HfApi(endpoint=f"{DATA_STORE_BASE_URL}/v1/hf", token=HF_TOKEN)

   # Set the dataset repository details
   repo_id = f"{NAMESPACE}/{DATASET_NAME}"

   # Delete the dataset
   hf_api.delete_repo(
      repo_type="dataset",
      repo_id=repo_id,
   )
   print(f"Successfully deleted dataset {repo_id}")

except Exception as e:
   print(f"Error deleting dataset: {str(e)}")
   raise

To Unregister a Dataset#

Remove the dataset from NeMo Entity Store so that other microservices can discover that the deletion has taken place.

Choose one of the following options of unregistering a dataset.

Set up a NeMoMicroservices client instance using the base URL of the NeMo Entity Store microservice and perform the task as follows.

from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(
      base_url=os.environ["ENTITY_STORE_BASE_URL"]
)

client.datasets.delete(
   namespace="your-namespace", # Namespace that you create using NeMo Entity Store
   dataset_name="your-dataset-name"
)

Make a DELETE request to the /v1/datasets/{namespace}/{dataset_name} endpoint. The following example deletes the documentation-test-dataset dataset.

export ENTITY_STORE_BASE_URL=<URL for NeMo Entity Store>
export NAMESPACE="your-namespace" # Namespace that you create using NeMo Entity Store
export DATASET_NAME="your-dataset-name"

curl -X DELETE "${ENTITY_STORE_BASE_URL}/v1/datasets/${NAMESPACE}/${DATASET_NAME}" \
      -H 'Accept: application/json' \
      -H 'Content-Type: application/json' | jq
Example Response
{
   "message": "Resource deleted successfully.",
   "id": "dataset-81RSQp7FKX3rdBtKvF9Skn",
   "deleted_at": null
}