Delete Dataset#
You can delete a dataset if you no longer need it through the NeMo Entity Store and NeMo Data Store microservices.
Prerequisites#
Before you can delete a dataset, make sure that you have:
Obtained the base URL of your NeMo Entity Store Microservice.
Permissions to access the NeMo Entity Store microservice endpoint.
Permissions to access the NeMo Data Store service URL.
Obtained the
namespace
anddataset_name
of the dataset you want to delete.
How to Delete a Dataset#
Deleting a dataset programmatically requires two steps: deleting and unregistration.
Hugging Face#
Delete dataset#
You can use both the Hugging Face API and SDK to delete the dataset repo. (The CLI does not currently support deletion.)
# Set the namespace and dataset name
NAMESPACE=<your_namespace>
DATASET_NAME=<your_dataset_name>
DATA_STORE_BASE_URL=<URL for NeMo Data Store>
# Delete the dataset
curl -X 'DELETE' "${DATA_STORE_BASE_URL}/v1/hf/api/repos/delete" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
"name": "${DATASET_NAME}",
"organization": "${NAMESPACE}",
"type": "dataset"
}'
Verify that the dataset is no longer present:
# Set the namespace and dataset name
NAMESPACE=<your_namespace>
DATASET_NAME=<your_dataset_name>
DATA_STORE_BASE_URL=<URL for NeMo Data Store>
# Try to retrieve the dataset
curl -X 'GET' \
"${DATA_STORE_BASE_URL}/v1/hf/api/datasets/$#{NAMESPACE}/${DATASET_NAME}"
from huggingface_hub import HfApi
import os
# Configure microservice host URLs
DATA_STORE_BASE_URL = os.getenv("DATA_STORE_BASE_URL")
# Define entity details
NAMESPACE = os.getenv("NAMESPACE", "default")
DATASET_NAME = "<your_dataset_name>"
# Provide HF token
HF_TOKEN = os.getenv("HF_TOKEN")
try:
# Initialize Hugging Face API client
# Note: A valid token is required for most operations
hf_api = HfApi(endpoint=f"{DATA_STORE_BASE_URL}/v1/hf", token=HF_TOKEN)
# Set the dataset repository details
repo_id = f"{NAMESPACE}/{DATASET_NAME}"
# Delete the dataset
hf_api.delete_repo(
repo_type="dataset",
repo_id=repo_id,
)
print(f"Successfully deleted dataset {repo_id}")
except Exception as e:
print(f"Error deleting dataset: {str(e)}")
raise
Unregister Dataset#
Remove the dataset from NeMo Entity Store so that other microservices can discover that the deletion has taken place.
Make a DELETE request to the
/v1/datasets/{namespace}/{dataset_name}
endpoint. The following example deletes thedocumentation-test-dataset
dataset.export ENTITY_STORE_BASE_URL=<URL for NeMo Entity Store> export NAMESPACE="team-docs" export DATASET_NAME="documentation-test-dataset" curl -X DELETE "${ENTITY_STORE_BASE_URL}/v1/datasets/${NAMESPACE}/${DATASET_NAME}" \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' | jq
Verify that the dataset was deleted by reviewing the response.
Example Response
{ "message": "Resource deleted successfully.", "id": "dataset-81RSQp7FKX3rdBtKvF9Skn", "deleted_at": null }