MinIO Support for File Ingestion#

This guide explains how to ingest video files and data from MinIO object storage into CDS.

Overview#

CDS uses boto3 for S3 operations, providing native compatibility with MinIO. Since MinIO implements the S3 API, it works seamlessly with CDS for file ingestion. Other S3-compatible storage solutions should work similarly, but this guide focuses on tested MinIO configurations.

Prerequisites#

  • CDS service running

  • MinIO server accessible from CDS

  • MinIO credentials (access key and secret key)

MinIO Setup#

1. Pull the Docker Image#

First, pull the minio/minio image from Docker Hub:

docker pull minio/minio

2. Run the Container#

Run the container, specifying the ports, access keys, and a data volume:

docker run -d --name minio-fileserver \
  -p 9000:9000 \
  -p 9001:9001 \
  -e "MINIO_ROOT_USER=minioadmin" \
  -e "MINIO_ROOT_PASSWORD=minioadmin" \
  -v /mnt/data:/data \
  minio/minio server /data --console-address ":9001"

Parameters explained:

  • -p 9000:9000: Maps the MinIO API port

  • -p 9001:9001: Maps the MinIO Console port

  • -e "MINIO_ROOT_USER=minioadmin": Sets the access username

  • -e "MINIO_ROOT_PASSWORD=minioadmin": Sets the access password

  • -v /mnt/data:/data: Creates a persistent volume (change /mnt/data to your preferred local directory)

  • --console-address ":9001": Specifies the console address

3. Access MinIO Console#

After the container is running, access the MinIO Console at http://localhost:9001 using the credentials minioadmin:minioadmin.

4. Create Bucket and Upload Files#

Note: For detailed instructions on uploading videos to MinIO, please refer to the MinIO documentation or use the AWS CLI S3 commands with the MinIO endpoint.

Collection Creation with Storage Secrets#

Before ingesting files, you need to create a collection with proper storage configuration. This example shows how to create a collection with MinIO/S3 storage secrets using the API directly.

Kubernetes Deployment#

1. Create Kubernetes Secret for MinIO Credentials#

# Set your MinIO credentials
MINIO_ACCESS_KEY="minioadmin"
MINIO_SECRET_KEY="minioadmin"
MINIO_REGION="us-east-1"
BUCKET_NAME="video-dataset"
SECRETS_NAME="${BUCKET_NAME}-secrets"

# Create Kubernetes secret with MinIO credentials
kubectl create secret generic $SECRETS_NAME \
  --from-literal=aws_access_key_id=$MINIO_ACCESS_KEY \
  --from-literal=aws_secret_access_key=$MINIO_SECRET_KEY \
  --from-literal=aws_region=$MINIO_REGION \
  --from-literal=endpoint_url=http://localhost:9000

2. Create Collection with Storage Configuration#

# Get your API endpoint
VS_API="your-cds-api-endpoint"  # e.g., localhost:8888 or your ingress hostname

# Create collection with storage secrets using curl
COLLECTION=$(curl -s -X POST "https://$VS_API/v1/collections" \
  -H "Content-Type: application/json" \
  -d '{
    "pipeline": "cosmos_video_search_milvus",
    "name": "MinIO Video Collection",
    "tags": {
      "storage-template": "s3://'$BUCKET_NAME'/videos/{{filename}}",
      "storage-secrets": "'$SECRETS_NAME'"
    },
    "collection_config": {},
    "index_config": {
      "index_type": "GPU_CAGRA",
      "params": {
        "intermediate_graph_degree": 64,
        "graph_degree": 32,
        "build_algo": "IVF_PQ",
        "cache_dataset_on_device": "true",
        "adapt_for_cpu": "true"
      },
      "metric_type": "IP"
    },
    "metadata_config": {
      "allow_dynamic_schema": true,
      "fields": []
    }
  }')

# Extract collection ID from response
COLLECTION_ID=$(echo "$COLLECTION" | jq -r '.collection.id')
echo "Created collection with ID: $COLLECTION_ID"

Key Configuration Elements:

  • storage-template: S3 URL template with {{filename}} placeholder for dynamic file paths

  • storage-secrets: Name of the Kubernetes secret containing MinIO credentials

  • tags: Collection metadata that enables asset URL generation during search

Docker Compose Deployment#

Important: MinIO requires a custom endpoint URL, so you cannot use simple environment variables like with AWS S3.

Single File Ingestion#

#!/bin/bash
MINIO_ENDPOINT="http://localhost:9000"
BUCKET="video-dataset"
COLLECTION_ID="your-collection-id"
FILENAME="sample.mp4"

curl -X POST "http://localhost:8888/v1/collections/${COLLECTION_ID}/documents" \
  -H 'Content-Type: application/json' \
  -d "[{
    \"url\": \"${MINIO_ENDPOINT}/${BUCKET}/videos/${FILENAME}\",
    \"mime_type\": \"video/mp4\",
    \"metadata\": {\"filename\": \"${FILENAME}\", \"source\": \"minio\"}
  }]"

echo "Ingested: $FILENAME"