Delete Dataset#
Delete a dataset you don’t need anymore. You need to delete the dataset from NeMo Data Store and unregister it from NeMo Entity Store.
Prerequisites#
Before you can delete a dataset, make sure that you have:
Obtained the base URL of your NeMo Entity Store Microservice.
Permissions to access the NeMo Entity Store microservice endpoint.
Permissions to access the NeMo Data Store service URL.
Obtained the
namespace
anddataset_name
of the dataset you want to delete.
To Delete a Dataset#
Deleting a dataset programmatically requires two steps: deleting and unregistering.
To Delete a Dataset Using the Hugging Face API#
You can use the Hugging Face (HF) API or SDK to delete the dataset repo.
Note
The HF CLI currently does not support deletion.
# Set the namespace and dataset name
NAMESPACE=<your_namespace>
DATASET_NAME=<your_dataset_name>
DATA_STORE_BASE_URL=<URL for NeMo Data Store>
# Delete the dataset
curl -X 'DELETE' "${DATA_STORE_BASE_URL}/v1/hf/api/repos/delete" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
"name": "${DATASET_NAME}",
"organization": "${NAMESPACE}",
"type": "dataset"
}'
Verify that the dataset is no longer present:
# Set the namespace and dataset name
NAMESPACE=<your_namespace>
DATASET_NAME=<your_dataset_name>
DATA_STORE_BASE_URL=<URL for NeMo Data Store>
# Try to retrieve the dataset
curl -X 'GET' \
"${DATA_STORE_BASE_URL}/v1/hf/api/datasets/$#{NAMESPACE}/${DATASET_NAME}"
from huggingface_hub import HfApi
import os
# Configure microservice host URLs
DATA_STORE_BASE_URL = os.getenv("DATA_STORE_BASE_URL")
# Define entity details
NAMESPACE = os.getenv("NAMESPACE", "default")
DATASET_NAME = "<your_dataset_name>"
# Provide HF token
HF_TOKEN = os.getenv("HF_TOKEN")
try:
# Initialize Hugging Face API client
# Note: A valid token is required for most operations
hf_api = HfApi(endpoint=f"{DATA_STORE_BASE_URL}/v1/hf", token=HF_TOKEN)
# Set the dataset repository details
repo_id = f"{NAMESPACE}/{DATASET_NAME}"
# Delete the dataset
hf_api.delete_repo(
repo_type="dataset",
repo_id=repo_id,
)
print(f"Successfully deleted dataset {repo_id}")
except Exception as e:
print(f"Error deleting dataset: {str(e)}")
raise
To Unregister a Dataset#
Remove the dataset from NeMo Entity Store so that other microservices can discover that the deletion has taken place.
Choose one of the following options of unregistering a dataset.
Set up a NeMoMicroservices
client instance using the base URL of the NeMo Entity Store microservice and perform the task as follows.
from nemo_microservices import NeMoMicroservices
client = NeMoMicroservices(
base_url=os.environ["ENTITY_STORE_BASE_URL"]
)
client.datasets.delete(
namespace="your-namespace", # Namespace that you create using NeMo Entity Store
dataset_name="your-dataset-name"
)
Make a DELETE request to the /v1/datasets/{namespace}/{dataset_name}
endpoint. The following example deletes the documentation-test-dataset
dataset.
export ENTITY_STORE_BASE_URL=<URL for NeMo Entity Store>
export NAMESPACE="your-namespace" # Namespace that you create using NeMo Entity Store
export DATASET_NAME="your-dataset-name"
curl -X DELETE "${ENTITY_STORE_BASE_URL}/v1/datasets/${NAMESPACE}/${DATASET_NAME}" \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' | jq
Example Response
{
"message": "Resource deleted successfully.",
"id": "dataset-81RSQp7FKX3rdBtKvF9Skn",
"deleted_at": null
}