CDS Command Line Interface User Guide#

The CDS CLI provides comprehensive command-line tools for managing collections, ingesting data, and performing searches.

Installation#

Prerequisites#

[Prerequisites for CLI installation]

Installing the CLI#

Use the following command to install the CDS CLI:

# Install CDS CLI with all required dependencies
make install-cds-cli

The CDS CLI will be available when the virtual environment is activated:

source .venv/bin/activate
cds --help

Configuration#

Setting Up API Endpoint#

Configure the CLI to connect to your CDS API endpoint. You can set up multiple profiles for different environments.

Configure Default Profile#

Use the following command to set up the default profile:

cds config set

Configure Named Profile#

The following commands demonstrate setting two different named profiles:

# Configure a local deployment profile
cds config set --profile local

# Configure a production profile
cds config set --profile production

Configuration File#

The CDS CLI stores configuration at ~/.config/cds/config:

[default]
api_endpoint = http://production-endpoint.example.com

[local]
api_endpoint = http://localhost:8888

[production]
api_endpoint = https://production.example.com

Using Profiles#

Use the --profile flag to select which endpoint to use:

cds collections list --profile local

S3 Configuration for Data Ingestion#

CDS ingests data from S3-compatible storage (LocalStack, MinIO, AWS S3). Configure S3 access using one of the following methods:

Option 1: Environment Variables (Quick Setup)#

The following command configures S3 credentials for ingestion using environment variables:

export AWS_ACCESS_KEY_ID=test
export AWS_SECRET_ACCESS_KEY=test
export AWS_ENDPOINT_URL=http://localhost:4566  # Required for LocalStack/MinIO
export AWS_DEFAULT_REGION=us-east-1

cds ingest files s3://bucket/videos/ --collection-id <ID> --extensions .mp4

Option 2: AWS Profile (Multiple S3 Endpoints)#

This method requires two configuration files:

~/.aws/credentials: Sets access keys

[cds-s3]
aws_access_key_id = test
aws_secret_access_key = test

~/.aws/config: Sets the endpoint URL. This is required for non-AWS S3 configurations (e.g. localstack/minio).
```
[profile cds-s3]
endpoint_url = http://localstack:4566
region = us-east-1
```

For ingestion from AWS S3, your environment variables should look like the following:

export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxx
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxx
export AWS_DEFAULT_REGION=us-east-2
# Do not set AWS_ENDPOINT_URL when using AWS S3 (only needed for LocalStack/MinIO)

Alternatively, you can set up a profile in your AWS credentials/config files:

~/.aws/credentials: Sets access keys

[cds-s3-aws]
aws_access_key_id = xxxxxxxxxxxxxx
aws_secret_access_key = xxxxxxxxxxxxxxxxxxxx
aws_session_token = xxxxxxxxxxxxxxxxxxxxxxxxxxxxx # Not necessary for Permanent keys

~/.aws/config: Sets the region

[profile cds-s3-aws]
region = us-east-2

Then, run ingestion with the following command:

cds ingest files s3://<bucket>/videos/ \
  --collection-id <ID> \
  --extensions .mp4 \
  --s3-profile cds-s3-aws

Note

Do not set AWS_ENDPOINT_URL when using AWS S3.

Then, use with --s3-profile as follows:

cds ingest files s3://bucket/videos/ \
  --collection-id <ID> \
  --extensions .mp4 \
  --s3-profile cds-s3

Key Notes:

Always use s3:// URIs (not presigned URLs).
For LocalStack, the default endpoint is http://localhost:4566.
The profile name format differs between ~/.aws/credentials and ~/.aws/config: [cds-s3] in credentials, [profile cds-s3] in config.

Managing Pipelines#

List Available Pipelines#

List all pipelines available for creating collections:

cds pipelines list

The following is an example output:

{
  "pipelines": [
    {
      "id": "cosmos_video_search_milvus",
      "enabled": true,
      "missing": []
    }
  ]
}

To list all pipelines with verbose output, use the --verbose flag:

cds pipelines list --verbose

This will output the complete pipeline configuration, including all components, connections, and initialization parameters.

Using Different Profiles#

# List pipelines from a different endpoint
cds pipelines list --profile production

Managing Collections#

Create a Collection#

Use the collections create command to create a new collection for storing and searching videos:

cds collections create --pipeline cosmos_video_search_milvus --name "My Video Collection"

Example output:

{
  "collection": {
    "pipeline": "cosmos_video_search_milvus",
    "name": "My Video Collection",
    "tags": {
      "default_index": "GPU_CAGRA"
    },
    "init_params": null,
    "cameras": "camera_front_wide_120fov",
    "id": "a7a5f9d38_078e_49ec_872e_a97a3277db69",
    "created_at": "2025-10-17T21:26:53.748150"
  }
}

Take note of the collection ID (e.g. a7a5f9d38_078e_49ec_872e_a97a3277db69). You will need this for ingestion and search.

Advanced: Override index type:

cds collections create \
  --pipeline cosmos_video_search_milvus \
  --name "High Performance Collection" \
  --index-type GPU_CAGRA

Advanced: Use a custom configuration:

# Create a config file with custom settings
cat > my-collection-config.yaml << EOF
tags:
  storage-template: "s3://my-bucket/videos/{{filename}}"
  storage-secrets: "my-s3-credentials"
index_config:
  index_type: GPU_CAGRA
  params:
    intermediate_graph_degree: 64
    graph_degree: 32
EOF

cds collections create \
  --pipeline cosmos_video_search_milvus \
  --name "Custom Collection" \
  --config-yaml my-collection-config.yaml

List Collections#

Use the collections list command to list all collections in your deployment:

cds collections list

Example output:

{
  "collections": [
    {
      "pipeline": "cosmos_video_search_milvus",
      "name": "My Video Collection",
      "tags": {
        "default_index": "GPU_CAGRA"
      },
      "init_params": null,
      "cameras": "camera_front_wide_120fov",
      "id": "a87235cc0_7a76_493a_8610_72080629baeb",
      "created_at": "2025-10-17T20:00:38.827842"
    }
  ]
}

Get Collection Details#

Use the collections get command to get detailed information about a specific collection:

cds collections get a87235cc0_7a76_493a_8610_72080629baeb

Example output:

{
  "collection": {
    "pipeline": "cosmos_video_search_milvus",
    "name": "My Video Collection",
    "tags": {
      "storage-template": "s3://cds-test-vp-905418373856/msrvtt-videos/{{filename}}",
      "storage-secrets": "cds-test-vp-905418373856-secrets-videos"
    },
    "init_params": null,
    "cameras": "camera_front_wide_120fov",
    "id": "a87235cc0_7a76_493a_8610_72080629baeb",
    "created_at": "2025-10-17T20:00:38.827842"
  },
  "total_documents_count": 5
}

The total_documents_count value shows how many videos have been ingested.

Delete a Collection#

Use the collections delete command to delete a collection.

Warning

This action is irreversible! All videos and embeddings will be permanently deleted.

cds collections delete a7a5f9d38_078e_49ec_872e_a97a3277db69

The following is an example output:

{
  "message": "Collection a7a5f9d38_078e_49ec_872e_a97a3277db69 deleted successfully.",
  "id": "a7a5f9d38_078e_49ec_872e_a97a3277db69",
  "deleted_at": "2025-10-17T21:27:46.499448"
}

Ingesting Data#

Ingest Videos from S3#

Use the ingest files command to ingest video files from an S3 bucket into a collection:

cds ingest files s3://my-bucket/videos/ \
  --collection-id a87235cc0_7a76_493a_8610_72080629baeb \
  --extensions mp4 \
  --num-workers 3 \
  --limit 10

Parameters:

s3://my-bucket/videos/: The S3 path containing videos
--collection-id: The collection UUID (from cds collections create)
--extensions: The file extensions to ingest (use mp4 for videos)
--num-workers: The number of parallel workers (default: 1)
--limit: The maximum number of files to ingest (optional)

The following is example output:

INFO:root:Loading profile default
2025-10-17 13:00:45,409 INFO worker.py:1951 -- Started a local Ray instance.
[13:00:45] 🧠 Spawned 3 file batch processors.

╭─────────────────────────────────── File ingestion ───────────────────────────────────╮╭───────── Responses ──────────╮
│                                                                                      ││                              │
│   Processed files: 5/5 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:04 0:00:00 ││    Status code 200: 5/5 100% │
│                                                                                      ││                              │
╰──────────────────────────────────────────────────────────────────────────────────────╯╰──────────────────────────────╯

[13:00:52] 🚀 Finished processing job queue!
           Processed 5 files successfully

A Status code 200 response indicates that ingestion was successful.

Using S3 profiles (required for AWS S3 access):

# Configure AWS profile with your S3 credentials
aws configure set aws_access_key_id YOUR_KEY --profile cds-s3-aws
aws configure set aws_secret_access_key YOUR_SECRET --profile cds-s3-aws
aws configure set region us-east-2 --profile cds-s3-aws

# Ingest videos from S3
cds ingest files s3://cds-test-vp-905418373856/msrvtt-videos/ \
  --collection-id d5aa2e3d_7421_4f42_911d_1a681c43d760 \
  --extensions mp4 \
  --s3-profile cds-s3-aws \
  --limit 3 \
  --num-workers 2

Verified output (from actual test):

INFO:root:Loading profile default
2025-10-17 19:23:49,612 INFO worker.py:1951 -- Started a local Ray instance.
[19:23:50] 🧠 Spawned 2 file batch processors.

╭─────────────────────────────────── File ingestion ───────────────────────────────────╮╭───────── Responses ──────────╮
│                                                                                      ││                              │
│   Processed files: 3/3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02 0:00:00 ││    Status code 200: 3/3 100% │
│                                                                                      ││                              │
╰──────────────────────────────────────────────────────────────────────────────────────╯╰──────────────────────────────╯

[19:23:53] 🚀 Finished processing job queue!
           Processed 3 files successfully
200: 3

A Status code 200 response indicates that ingestion was successful.

Ingest Precomputed Embeddings#

If you have precomputed embeddings in Parquet format, you can bulk insert them directly.

Parquet format has the following requirements:

An id field (string), as a document identifier
An embedding field (list of 256 floats), for vector embeddings
Can include additional metadata fields

Configure AWS profile:

aws configure set aws_access_key_id YOUR_KEY --profile cds-s3-aws
aws configure set aws_secret_access_key YOUR_SECRET --profile cds-s3-aws
aws configure set region us-east-2 --profile cds-s3-aws

These are the requirements for Parquet embeddings:

The parquet file must be in the same S3 bucket that Milvus is configured to use (check your milvus-values.yaml for externalS3.bucketName).
The parquet file should only contain columns defined in the collection schema (typically id and embedding)
Metadata fields must be formatted using the Milvus $meta column format (not supported via the --metadata-cols parameter).

# Assumption is that your aws cli is configured and you have your milvus bucket configured
# First, ensure your parquet file is in the correct bucket
# Example: if Milvus is configured to use bucket 'my-milvus-bucket'
aws s3 cp s3://your-source-bucket/embeddings.parquet s3://my-milvus-bucket/embeddings.parquet

# Then ingest (without metadata columns)
cds ingest embeddings \
  --parquet-dataset s3://cds-test-vp-905418373856/milvus_embeddings.parquet \
  --collection-id a68495826_0c1d_4de4_8cdd_9e309d876ad7 \
  --id-cols id \
  --embeddings-col embedding \
  --s3-profile cds-s3-aws

The ingest embeddings command has the following parameters:

--parquet-dataset: The S3 path to parquet file(s) (must be in the Milvus configured bucket)
--id-cols: The columns to generate document IDs from (required)
--embeddings-col: The column containing embedding vectors (default: “embeddings”)
--s3-profile: The AWS profile with S3 credentials

The following is an example output:

INFO:root:Loading profile default
2025-10-19 00:56:17,410 INFO worker.py:1951 -- Started a local Ray instance.
[00:56:17] 🧠 Spawned 1 parquet batch processors.

╭─────────────────────────────────── File ingestion ───────────────────────────────────╮╭───────── Responses ──────────╮
│                                                                                      ││                              │
│   Processed files: 1/1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01 0:00:00 ││    Status code 202: 1/1 100% │
│                                                                                      ││                              │
╰──────────────────────────────────────────────────────────────────────────────────────╯╰──────────────────────────────╯

[00:56:20] 🚀 Finished processing job queue!
           1 files returned status code 202
202: 1

A Status code 202 response indicates that the bulk insert job was queued successfully and will be processed asynchronously.

Verify Job Completion#

Since a 202 response indicates the job was accepted (not completed), use the following commands to verify bulk insert job completion succeeded:

# Get the API endpoint
VS_API=$(kubectl get ingress simple-ingress -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

# Check job status for your collection
curl -k "https://$VS_API/api/v1/jobs?collection_name=<your-collection-id>" | jq

# Verify documents were inserted
cds collections get --collection-id <your-collection-id>

Look for "status": "completed" and "progress": 100 in the job status, and verify total_documents_count > 0 in the collection info.

Monitoring Ingestion Progress#

The CLI provides real-time progress monitoring with the following:

Progress bar for files processed
Status codes breakdown (200 = success, 500 = error)
Time elapsed and remaining
Final summary

Searching#

Text-to-Video Search#

Use the search command to search for videos using natural language queries:

cds search \
  --collection-ids d5aa2e3d_7421_4f42_911d_1a681c43d760 \
  --text-query "a person walking on the street" \
  --top-k 3

The following is example output:

{
  "retrievals": [
    {
      "id": "290c3a15d90ccf7fd3ffe5d921150bd7e4da6ae555eea37e2e199a83400b22c7",
      "metadata": {
        "filename": "video7020.mp4",
        "source_id": "04bccb4352ab07cb7a1589c1e578fd464d36b53aff84ffed2868f6ce5ba8a5eb",
        "indexed_at": "2025-10-18T02:23:51.088094",
        "source_url": "https://s3.us-east-2.amazonaws.com/cds-test-vp-905418373856/msrvtt-videos/video7020.mp4?..."
      },
      "collection_id": "d5aa2e3d_7421_4f42_911d_1a681c43d760",
      "asset_url": null,
      "score": -0.0023960545659065247,
      "content": "",
      "mime_type": "video/mp4",
      "embedding": null
    },
    {
      "id": "5bc56113b11afadbea853e0010dbf593c63290451beab5669bbd76ccbeb39d7a",
      "metadata": {
        "filename": "video7024.mp4",
        "indexed_at": "2025-10-18T02:23:51.887275",
        ...
      },
      "score": -0.024522194638848305,
      ...
    },
    {
      "id": "b746f10ca6800c1f1e004354dfe089cae50bec438e7757323c1478324ce22792",
      "metadata": {
        "filename": "video7021.mp4",
        ...
      },
      "score": -0.03927513211965561,
      ...
    }
  ]
}

The results are sorted by score (most relevant first) and include the following:

score: The similarity score (higher is better)
asset_url: The presigned S3 URL to download/view the video
metadata: The associated metadata (filename, timestamps, etc.)

Search Multiple Collections#

Pass multiple collection IDs to the search command to search across multiple collections at once:

cds search \
  --collection-ids "d5aa2e3d_7421_4f42_911d_1a681c43d760,a87235cc0_7a76_493a_8610_72080629baeb" \
  --text-query "person walking" \
  --top-k 5

The results are merged from all specified collections and ranked by score. Each result includes the collection_id field indicating which collection it came from.

Search Options#

Disable Asset URL Generation#

Set the --generate-asset-url flag to false to speed up search for large result sets:

cds search \
  --collection-ids <collection-id> \
  --text-query "query text" \
  --generate-asset-url false

Use a Different Profile#

Use the --profile flag to specify a different profile:

# Configure production profile
cds config set --profile production
# Enter API endpoint: https://your-production-hostname/api

Then use it for commands:

cds search \
  --collection-ids d5aa2e3d_7421_4f42_911d_1a681c43d760 \
  --text-query "person walking" \
  --top-k 3 \
  --profile production

Note

The output will show INFO:root:Loading profile production confirming the correct profile is being used.

Managing Secrets#

Note

The secrets API endpoint is not implemented in this version. Use Kubernetes secrets directly for S3 credentials instead.

Create Kubernetes Secret for S3 Access#

For collections that need S3 access, create a Kubernetes secret:

# Create secret with AWS credentials
docker exec cds-deployment kubectl create secret generic my-s3-creds \
  --from-literal=aws_access_key_id=AKIA... \
  --from-literal=aws_secret_access_key=secret... \
  --from-literal=aws_region=us-east-2

List Kubernetes Secrets#

Use the get secrets command to list all Kubernetes secrets:

docker exec cds-deployment kubectl get secrets

Use Secret in Collection#

Use the collections create command to create a collection that references S3 videos:

cds collections create --pipeline cosmos_video_search_milvus \
  --name "My Collection" \
  --config-yaml <(echo "
tags:
  storage-template: 's3://my-bucket/videos/{{filename}}'
  storage-secrets: 'my-s3-creds'
")

Advanced Usage#

Batch Ingestion#

For ingesting large numbers of videos, increase workers for parallel processing:

cds ingest files s3://cds-test-vp-905418373856/msrvtt-videos/ \
  --collection-id a9fab0958_1079_412e_b7b8_d863fcecccae \
  --extensions mp4 \
  --num-workers 10 \
  --batch-size 5 \
  --limit 30 \
  --s3-profile cds-s3-aws

This is example output from ingesting 30 videos in 4 seconds:

INFO:root:Loading profile default
[19:53:23] 🧠 Spawned 10 file batch processors.

╭──────────────────────────────────── File ingestion ────────────────────────────────────╮╭────────── Responses ───────────╮
│                                                                                        ││                                │
│   Processed files: 30/30 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:04 0:00:00 ││    Status code 200: 30/30 100% │
│                                                                                        ││                                │
╰────────────────────────────────────────────────────────────────────────────────────────╯╰────────────────────────────────╯

[19:53:29] 🚀 Finished processing job queue!
           Processed 30 files successfully
200: 30

Performance tips:

More workers will increase ingestion speed (10 workers processed 30 videos in 4 seconds).
The optimal number of workers is 3-10, depending on your deployment size.
Use the --limit parameter for testing before full ingestion.
Batch size affects API request grouping (default: 1).

Output Logging#

Use the --output-log parameter to log ingestion results to a CSV file for analysis:

cds ingest files s3://cds-test-vp-905418373856/msrvtt-videos/ \
  --collection-id a9fab0958_1079_412e_b7b8_d863fcecccae \
  --extensions mp4 \
  --limit 10 \
  --num-workers 3 \
  --s3-profile cds-s3-aws \
  --output-log ~/ingestion-results.csv

The console output will report where the log file is being written:

📖 Logging responses in /home/user/ingestion-results.csv

The following is an example of the CSV file contents:

file,status
s3://cds-test-vp-905418373856/msrvtt-videos/video7020.mp4,200
s3://cds-test-vp-905418373856/msrvtt-videos/video7024.mp4,200
s3://cds-test-vp-905418373856/msrvtt-videos/video7021.mp4,200
...

Each row contains the file path and HTTP status code (a 200 value indicates success).

CLI Command Reference#

Quick Command Summary#

# Configuration
cds config set [--profile PROFILE]

# Pipelines
cds pipelines list [--verbose] [--profile PROFILE]

# Collections
cds collections create --pipeline PIPELINE --name NAME [options]
cds collections list [--profile PROFILE]
cds collections get COLLECTION_ID [--profile PROFILE]
cds collections delete COLLECTION_ID [--profile PROFILE]

# Ingestion
cds ingest files S3_PATH --collection-id ID --extensions mp4 [options]
cds ingest embeddings --parquet-dataset S3_PATH --collection-id ID --id-cols COLS [options]

# Search
cds search --collection-ids ID --text-query "TEXT" --top-k K [options]

# Secrets (use kubectl directly)
docker exec cds-deployment kubectl create secret generic NAME --from-literal=key=value
docker exec cds-deployment kubectl get secrets
docker exec cds-deployment kubectl delete secret NAME

Troubleshooting#

Common CLI Issues#

Issue: Profile 'xyz' is not available

Solution: Configure the profile first:

cds config set --profile xyz

Issue: Collection not found

Solution: List the collections to verify the ID:

cds collections list

Issue: S3 access denied

Solution: Check your S3 credentials or profile configuration:

aws s3 ls s3://your-bucket/  # Test AWS credentials work

Issue: Status code 500 during ingestion

Solution: Check the following:

The video format is supported (MP4 recommended).
The videos are accessible from S3.
Check the CDS service logs: docker exec kubectl logs deployment/visual-search

Getting Help#

Run cds --help for command overview or cds <command> --help for detailed command information:

cds --help
cds collections create --help
cds ingest files --help

CDS Command Line Interface User Guide#

Installation#

Prerequisites#

Installing the CLI#

Configuration#

Setting Up API Endpoint#

Configure Default Profile#

Configure Named Profile#

Configuration File#

Using Profiles#

S3 Configuration for Data Ingestion#

Option 1: Environment Variables (Quick Setup)#

Option 2: AWS Profile (Multiple S3 Endpoints)#

Managing Pipelines#

List Available Pipelines#

Using Different Profiles#

Managing Collections#

Create a Collection#

List Collections#

Get Collection Details#

Delete a Collection#

Ingesting Data#

Ingest Videos from S3#

Ingest Precomputed Embeddings#

Verify Job Completion#

Monitoring Ingestion Progress#

Searching#

Text-to-Video Search#

Search Multiple Collections#

Search Options#

Disable Asset URL Generation#

Use a Different Profile#

Managing Secrets#

Create Kubernetes Secret for S3 Access#

List Kubernetes Secrets#

Use Secret in Collection#

Advanced Usage#

Batch Ingestion#

Output Logging#

CLI Command Reference#

Quick Command Summary#

Troubleshooting#

Common CLI Issues#

Getting Help#

Related Documentation#