NeMo Data Store Microservice Deployment and Setup Guide#

You deploy NeMo Data Store as part of the NeMo platform deployment using the NeMo Microservices Helm Chart. To make NeMo Data Store API accessible, you need to set up the NeMo Data Store endpoint after the deployment has completed. This section covers the following topics:

  • Setting up NeMo Data Store with an external PostgreSQL database for production

  • Setting up the NeMo Data Store API endpoint

  • Setting up the NeMo Data Store with Hugging Face

Tip

For an end-to-end deployment of the NeMo platform, see the Install NeMo Microservices Helm Chart.

Set Up Databases and Storage for Production#

Prerequisites#

To install NeMo Data Store on a production environment, make sure that you have met the following prerequisites:

Storage

NeMo Data Store manages data across the following databases and storage systems.

  • PostgreSQL Database: Stores repository, branch, and LFS metadata. It can be externally provisioned or hosted in your cluster.

  • Object Store: Stores large files such as datasets and model weights. It can be an external object store such as Amazon S3 or an internal local volume.

  • Persistent Volume (PVC): Stores Git history for models and datasets.

Kubernetes

Configure with External PostgreSQL#

By default, the NeMo Data Store Helm chart uses the Bitnami PostgreSQL chart to deploy a PostgreSQL database. Refer to the PostgreSQL section for information on how to configure the microservice with an external PostgreSQL database.

Configure with External Object Store#

Refer to the Amazon S3 section for information on how to configure the NeMo Data Store with an external object store.

Configure with NFS-backed PVCs#

Refer to the NFS-backed PVCs section for information on how to configure the NeMo Data Store with NFS-backed Persistent Volumes.

Set Up NeMo Data Store Endpoint#

After the deployment has completed, set up a NeMo Data Store endpoint.

This following sections guide you through how to set up the deployed NeMo Data Store microservice as an endpoint. After setting up the endpoint, your users can create Git repositories for training datasets, models, and other files.

Set Up NeMo Data Store Endpoint Environment Variable#

Start with setting up an environment variable for configuring the NeMo Data Store endpoint.

  1. Prepare the NeMo Data Store URL information. It depends on how you have set up the NeMo Data Store in your Kubernetes cluster during the deployment process.

    1. Internal URL: Other NeMo microservices such as NeMo Customizer and NeMo Evaluator hosted in the same Kubernetes cluster use the cluster internal URL of NeMo Data Store. The internal URL is in the following format.

      http://<nemo-datastore-service-name>.<datastore-namespace>.svc.cluster.local:port
      

      You can retrieve the service name of NeMo Data Store with the following command.

      kubectl get service -l app=nemo-datastore -n <datastore-namespace> -o jsonpath="{.items[0].metadata.name}"
      
    2. External URL: Identify the external URL of your NeMo Data Store endpoint. This is what you have configured during the Helm chart deployment.

  2. Set the DATA_STORE_ENDPOINT environment variable to the external URL of your NeMo Data Store endpoint.

    export DATA_STORE_ENDPOINT="<your-data-store-endpoint>"
    

    Ensure that it begins with http or https and doesn’t end with a trailing slash /. The following is an example using the external URL.

    export DATA_STORE_ENDPOINT=https://nemo-datastore.example.com
    

Set Up NeMo Data Store Endpoint with Hugging Face#

Using the Hugging Face Python SDK#

  1. Install the Hugging Face Hub.

    pip install huggingface_hub
    

    Also, consider using hf_transfer for faster file transfer. To use it, set the environment variable HF_HUB_ENABLE_HF_TRANSFER to 1.

    pip install hf_transfer
    export HF_HUB_ENABLE_HF_TRANSFER=1
    
  2. Set up an HfApi client object with the NeMo Data Store endpoint.

    import requests
    from huggingface_hub import HfApi
    
    hf_api = HfApi(endpoint=f"${DATA_STORE_ENDPOINT}/v1/hf", token="")
    
  3. Create a namespace in your NeMo Data Store. You can skip this step if you want to use an existing namespace.

    namespace = "<your-namespace>"
    requests.post(f'${DATA_STORE_ENDPOINT}/v1/datastore/namespaces', data=dict(namespace=namespace))
    

Using the Hugging Face CLI#

  1. Set up the following environment variables. If you haven’t installed Git Large File Storage, install it by running git lfs install.

    export HF_ENDPOINT="${DATA_STORE_ENDPOINT}/v1/hf"
    export HF_TOKEN="token"
    

    For faster file transfer, use the Hugging Face Transfer library. Install and enable it with the following commands.

    pip install hf_transfer
    export HF_HUB_ENABLE_HF_TRANSFER=1
    
  2. Verify the environment setting by running huggingface-cli whoami.

    huggingface-cli whoami
    

    The output should confirm as follows.

    default
    Authenticated through private endpoint: <your-data-store-endpoint>/v1/hf