CDS Command Line Interface User Guide#
The CDS CLI provides comprehensive command-line tools for managing collections, ingesting data, and performing searches.
Installation#
Prerequisites#
[Prerequisites for CLI installation]
Installing the CLI#
Use the following command to install the CDS CLI:
# Install CDS CLI with all required dependencies
make install-cds-cli
The CDS CLI will be available when the virtual environment is activated:
source .venv/bin/activate
cds --help
Configuration#
Setting Up API Endpoint#
Configure the CLI to connect to your CDS API endpoint. You can set up multiple profiles for different environments.
Configure Default Profile#
Use the following command to set up the default profile:
cds config set
Configure Named Profile#
The following commands demonstrate setting two different named profiles:
# Configure a local deployment profile
cds config set --profile local
# Configure a production profile
cds config set --profile production
Configuration File#
The CDS CLI stores configuration at ~/.config/cds/config:
[default]
api_endpoint = http://production-endpoint.example.com
[local]
api_endpoint = http://localhost:8888
[production]
api_endpoint = https://production.example.com
Using Profiles#
Use the --profile flag to select which endpoint to use:
cds collections list --profile local
S3 Configuration for Data Ingestion#
CDS ingests data from S3-compatible storage (LocalStack, MinIO, AWS S3). Configure S3 access using one of the following methods:
Option 1: Environment Variables (Quick Setup)#
The following command configures S3 credentials for ingestion using environment variables:
export AWS_ACCESS_KEY_ID=test
export AWS_SECRET_ACCESS_KEY=test
export AWS_ENDPOINT_URL=http://localhost:4566 # Required for LocalStack/MinIO
export AWS_DEFAULT_REGION=us-east-1
cds ingest files s3://bucket/videos/ --collection-id <ID> --extensions .mp4
Option 2: AWS Profile (Multiple S3 Endpoints)#
This method requires two configuration files:
~/.aws/credentials: Sets access keys[cds-s3] aws_access_key_id = test aws_secret_access_key = test
~/.aws/config: Sets the endpoint URL. This is required for non-AWS S3 configurations (e.g.localstack/minio).[profile cds-s3] endpoint_url = http://localstack:4566 region = us-east-1
For ingestion from AWS S3, your environment variables should look like the following:
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxx
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxx
export AWS_DEFAULT_REGION=us-east-2
# Do not set AWS_ENDPOINT_URL when using AWS S3 (only needed for LocalStack/MinIO)
Alternatively, you can set up a profile in your AWS credentials/config files:
~/.aws/credentials: Sets access keys[cds-s3-aws] aws_access_key_id = xxxxxxxxxxxxxx aws_secret_access_key = xxxxxxxxxxxxxxxxxxxx aws_session_token = xxxxxxxxxxxxxxxxxxxxxxxxxxxxx # Not necessary for Permanent keys
~/.aws/config: Sets the region[profile cds-s3-aws] region = us-east-2
Then, run ingestion with the following command:
cds ingest files s3://<bucket>/videos/ \
--collection-id <ID> \
--extensions .mp4 \
--s3-profile cds-s3-aws
Note
Do not set AWS_ENDPOINT_URL when using AWS S3.
Then, use with --s3-profile as follows:
cds ingest files s3://bucket/videos/ \
--collection-id <ID> \
--extensions .mp4 \
--s3-profile cds-s3
Key Notes:
Always use
s3://URIs (not presigned URLs).For LocalStack, the default endpoint is
http://localhost:4566.The profile name format differs between
~/.aws/credentialsand~/.aws/config:[cds-s3]in credentials,[profile cds-s3]in config.
Managing Pipelines#
List Available Pipelines#
List all pipelines available for creating collections:
cds pipelines list
The following is an example output:
{
"pipelines": [
{
"id": "cosmos_video_search_milvus",
"enabled": true,
"missing": []
}
]
}
To list all pipelines with verbose output, use the --verbose flag:
cds pipelines list --verbose
This will output the complete pipeline configuration, including all components, connections, and initialization parameters.
Using Different Profiles#
# List pipelines from a different endpoint
cds pipelines list --profile production
Managing Collections#
Create a Collection#
Use the collections create command to create a new collection for storing and searching videos:
cds collections create --pipeline cosmos_video_search_milvus --name "My Video Collection"
Example output:
{
"collection": {
"pipeline": "cosmos_video_search_milvus",
"name": "My Video Collection",
"tags": {
"default_index": "GPU_CAGRA"
},
"init_params": null,
"cameras": "camera_front_wide_120fov",
"id": "a7a5f9d38_078e_49ec_872e_a97a3277db69",
"created_at": "2025-10-17T21:26:53.748150"
}
}
Take note of the collection ID (e.g. a7a5f9d38_078e_49ec_872e_a97a3277db69). You will need this for ingestion and search.
Advanced: Override index type:
cds collections create \
--pipeline cosmos_video_search_milvus \
--name "High Performance Collection" \
--index-type GPU_CAGRA
Advanced: Use a custom configuration:
# Create a config file with custom settings
cat > my-collection-config.yaml << EOF
tags:
storage-template: "s3://my-bucket/videos/{{filename}}"
storage-secrets: "my-s3-credentials"
index_config:
index_type: GPU_CAGRA
params:
intermediate_graph_degree: 64
graph_degree: 32
EOF
cds collections create \
--pipeline cosmos_video_search_milvus \
--name "Custom Collection" \
--config-yaml my-collection-config.yaml
List Collections#
Use the collections list command to list all collections in your deployment:
cds collections list
Example output:
{
"collections": [
{
"pipeline": "cosmos_video_search_milvus",
"name": "My Video Collection",
"tags": {
"default_index": "GPU_CAGRA"
},
"init_params": null,
"cameras": "camera_front_wide_120fov",
"id": "a87235cc0_7a76_493a_8610_72080629baeb",
"created_at": "2025-10-17T20:00:38.827842"
}
]
}
Get Collection Details#
Use the collections get command to get detailed information about a specific collection:
cds collections get a87235cc0_7a76_493a_8610_72080629baeb
Example output:
{
"collection": {
"pipeline": "cosmos_video_search_milvus",
"name": "My Video Collection",
"tags": {
"storage-template": "s3://cds-test-vp-905418373856/msrvtt-videos/{{filename}}",
"storage-secrets": "cds-test-vp-905418373856-secrets-videos"
},
"init_params": null,
"cameras": "camera_front_wide_120fov",
"id": "a87235cc0_7a76_493a_8610_72080629baeb",
"created_at": "2025-10-17T20:00:38.827842"
},
"total_documents_count": 5
}
The total_documents_count value shows how many videos have been ingested.
Delete a Collection#
Use the collections delete command to delete a collection.
Warning
This action is irreversible! All videos and embeddings will be permanently deleted.
cds collections delete a7a5f9d38_078e_49ec_872e_a97a3277db69
The following is an example output:
{
"message": "Collection a7a5f9d38_078e_49ec_872e_a97a3277db69 deleted successfully.",
"id": "a7a5f9d38_078e_49ec_872e_a97a3277db69",
"deleted_at": "2025-10-17T21:27:46.499448"
}
Ingesting Data#
Ingest Videos from S3#
Use the ingest files command to ingest video files from an S3 bucket into a collection:
cds ingest files s3://my-bucket/videos/ \
--collection-id a87235cc0_7a76_493a_8610_72080629baeb \
--extensions mp4 \
--num-workers 3 \
--limit 10
Parameters:
s3://my-bucket/videos/: The S3 path containing videos--collection-id: The collection UUID (fromcds collections create)--extensions: The file extensions to ingest (usemp4for videos)--num-workers: The number of parallel workers (default: 1)--limit: The maximum number of files to ingest (optional)
The following is example output:
INFO:root:Loading profile default
2025-10-17 13:00:45,409 INFO worker.py:1951 -- Started a local Ray instance.
[13:00:45] ๐ง Spawned 3 file batch processors.
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ File ingestion โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎโญโโโโโโโโโ Responses โโโโโโโโโโโฎ
โ โโ โ
โ Processed files: 5/5 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% 0:00:04 0:00:00 โโ Status code 200: 5/5 100% โ
โ โโ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏโฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
[13:00:52] ๐ Finished processing job queue!
Processed 5 files successfully
A Status code 200 response indicates that ingestion was successful.
Using S3 profiles (required for AWS S3 access):
# Configure AWS profile with your S3 credentials
aws configure set aws_access_key_id YOUR_KEY --profile cds-s3-aws
aws configure set aws_secret_access_key YOUR_SECRET --profile cds-s3-aws
aws configure set region us-east-2 --profile cds-s3-aws
# Ingest videos from S3
cds ingest files s3://cds-test-vp-905418373856/msrvtt-videos/ \
--collection-id d5aa2e3d_7421_4f42_911d_1a681c43d760 \
--extensions mp4 \
--s3-profile cds-s3-aws \
--limit 3 \
--num-workers 2
Verified output (from actual test):
INFO:root:Loading profile default
2025-10-17 19:23:49,612 INFO worker.py:1951 -- Started a local Ray instance.
[19:23:50] ๐ง Spawned 2 file batch processors.
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ File ingestion โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎโญโโโโโโโโโ Responses โโโโโโโโโโโฎ
โ โโ โ
โ Processed files: 3/3 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% 0:00:02 0:00:00 โโ Status code 200: 3/3 100% โ
โ โโ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏโฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
[19:23:53] ๐ Finished processing job queue!
Processed 3 files successfully
200: 3
A Status code 200 response indicates that ingestion was successful.
Ingest Precomputed Embeddings#
If you have precomputed embeddings in Parquet format, you can bulk insert them directly.
Parquet format has the following requirements:
An
idfield (string), as a document identifierAn
embeddingfield (list of 256 floats), for vector embeddingsCan include additional metadata fields
Configure AWS profile:
aws configure set aws_access_key_id YOUR_KEY --profile cds-s3-aws
aws configure set aws_secret_access_key YOUR_SECRET --profile cds-s3-aws
aws configure set region us-east-2 --profile cds-s3-aws
These are the requirements for Parquet embeddings:
The parquet file must be in the same S3 bucket that Milvus is configured to use (check your
milvus-values.yamlforexternalS3.bucketName).The parquet file should only contain columns defined in the collection schema (typically
idandembedding)Metadata fields must be formatted using the Milvus
$metacolumn format (not supported via the--metadata-colsparameter).
# Assumption is that your aws cli is configured and you have your milvus bucket configured
# First, ensure your parquet file is in the correct bucket
# Example: if Milvus is configured to use bucket 'my-milvus-bucket'
aws s3 cp s3://your-source-bucket/embeddings.parquet s3://my-milvus-bucket/embeddings.parquet
# Then ingest (without metadata columns)
cds ingest embeddings \
--parquet-dataset s3://cds-test-vp-905418373856/milvus_embeddings.parquet \
--collection-id a68495826_0c1d_4de4_8cdd_9e309d876ad7 \
--id-cols id \
--embeddings-col embedding \
--s3-profile cds-s3-aws
The ingest embeddings command has the following parameters:
--parquet-dataset: The S3 path to parquet file(s) (must be in the Milvus configured bucket)--id-cols: The columns to generate document IDs from (required)--embeddings-col: The column containing embedding vectors (default: โembeddingsโ)--s3-profile: The AWS profile with S3 credentials
The following is an example output:
INFO:root:Loading profile default
2025-10-19 00:56:17,410 INFO worker.py:1951 -- Started a local Ray instance.
[00:56:17] ๐ง Spawned 1 parquet batch processors.
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ File ingestion โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎโญโโโโโโโโโ Responses โโโโโโโโโโโฎ
โ โโ โ
โ Processed files: 1/1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% 0:00:01 0:00:00 โโ Status code 202: 1/1 100% โ
โ โโ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏโฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
[00:56:20] ๐ Finished processing job queue!
1 files returned status code 202
202: 1
A Status code 202 response indicates that the bulk insert job was queued successfully and will be processed asynchronously.
Verify Job Completion#
Since a 202 response indicates the job was accepted (not completed), use the following commands to verify bulk insert job completion succeeded:
# Get the API endpoint
VS_API=$(kubectl get ingress simple-ingress -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
# Check job status for your collection
curl -k "https://$VS_API/api/v1/jobs?collection_name=<your-collection-id>" | jq
# Verify documents were inserted
cds collections get --collection-id <your-collection-id>
Look for "status": "completed" and "progress": 100 in the job status, and verify total_documents_count > 0 in the collection info.
Monitoring Ingestion Progress#
The CLI provides real-time progress monitoring with the following:
Progress bar for files processed
Status codes breakdown (200 = success, 500 = error)
Time elapsed and remaining
Final summary
Searching#
Text-to-Video Search#
Use the search command to search for videos using natural language queries:
cds search \
--collection-ids d5aa2e3d_7421_4f42_911d_1a681c43d760 \
--text-query "a person walking on the street" \
--top-k 3
The following is example output:
{
"retrievals": [
{
"id": "290c3a15d90ccf7fd3ffe5d921150bd7e4da6ae555eea37e2e199a83400b22c7",
"metadata": {
"filename": "video7020.mp4",
"source_id": "04bccb4352ab07cb7a1589c1e578fd464d36b53aff84ffed2868f6ce5ba8a5eb",
"indexed_at": "2025-10-18T02:23:51.088094",
"source_url": "https://s3.us-east-2.amazonaws.com/cds-test-vp-905418373856/msrvtt-videos/video7020.mp4?..."
},
"collection_id": "d5aa2e3d_7421_4f42_911d_1a681c43d760",
"asset_url": null,
"score": -0.0023960545659065247,
"content": "",
"mime_type": "video/mp4",
"embedding": null
},
{
"id": "5bc56113b11afadbea853e0010dbf593c63290451beab5669bbd76ccbeb39d7a",
"metadata": {
"filename": "video7024.mp4",
"indexed_at": "2025-10-18T02:23:51.887275",
...
},
"score": -0.024522194638848305,
...
},
{
"id": "b746f10ca6800c1f1e004354dfe089cae50bec438e7757323c1478324ce22792",
"metadata": {
"filename": "video7021.mp4",
...
},
"score": -0.03927513211965561,
...
}
]
}
The results are sorted by score (most relevant first) and include the following:
score: The similarity score (higher is better)asset_url: The presigned S3 URL to download/view the videometadata: The associated metadata (filename, timestamps, etc.)
Search Multiple Collections#
Pass multiple collection IDs to the search command to search across multiple collections at once:
cds search \
--collection-ids "d5aa2e3d_7421_4f42_911d_1a681c43d760,a87235cc0_7a76_493a_8610_72080629baeb" \
--text-query "person walking" \
--top-k 5
The results are merged from all specified collections and ranked by score. Each result includes the collection_id field indicating which collection it came from.
Search Options#
Disable Asset URL Generation#
Set the --generate-asset-url flag to false to speed up search for large result sets:
cds search \
--collection-ids <collection-id> \
--text-query "query text" \
--generate-asset-url false
Use a Different Profile#
Use the --profile flag to specify a different profile:
# Configure production profile
cds config set --profile production
# Enter API endpoint: https://your-production-hostname/api
Then use it for commands:
cds search \
--collection-ids d5aa2e3d_7421_4f42_911d_1a681c43d760 \
--text-query "person walking" \
--top-k 3 \
--profile production
Note
The output will show INFO:root:Loading profile production confirming the correct profile is being used.
Managing Secrets#
Note
The secrets API endpoint is not implemented in this version. Use Kubernetes secrets directly for S3 credentials instead.
Create Kubernetes Secret for S3 Access#
For collections that need S3 access, create a Kubernetes secret:
# Create secret with AWS credentials
docker exec cds-deployment kubectl create secret generic my-s3-creds \
--from-literal=aws_access_key_id=AKIA... \
--from-literal=aws_secret_access_key=secret... \
--from-literal=aws_region=us-east-2
List Kubernetes Secrets#
Use the get secrets command to list all Kubernetes secrets:
docker exec cds-deployment kubectl get secrets
Use Secret in Collection#
Use the collections create command to create a collection that references S3 videos:
cds collections create --pipeline cosmos_video_search_milvus \
--name "My Collection" \
--config-yaml <(echo "
tags:
storage-template: 's3://my-bucket/videos/{{filename}}'
storage-secrets: 'my-s3-creds'
")
Advanced Usage#
Batch Ingestion#
For ingesting large numbers of videos, increase workers for parallel processing:
cds ingest files s3://cds-test-vp-905418373856/msrvtt-videos/ \
--collection-id a9fab0958_1079_412e_b7b8_d863fcecccae \
--extensions mp4 \
--num-workers 10 \
--batch-size 5 \
--limit 30 \
--s3-profile cds-s3-aws
This is example output from ingesting 30 videos in 4 seconds:
INFO:root:Loading profile default
[19:53:23] ๐ง Spawned 10 file batch processors.
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ File ingestion โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎโญโโโโโโโโโโ Responses โโโโโโโโโโโโฎ
โ โโ โ
โ Processed files: 30/30 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% 0:00:04 0:00:00 โโ Status code 200: 30/30 100% โ
โ โโ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏโฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
[19:53:29] ๐ Finished processing job queue!
Processed 30 files successfully
200: 30
Performance tips:
More workers will increase ingestion speed (10 workers processed 30 videos in 4 seconds).
The optimal number of workers is 3-10, depending on your deployment size.
Use the
--limitparameter for testing before full ingestion.Batch size affects API request grouping (default: 1).
Output Logging#
Use the --output-log parameter to log ingestion results to a CSV file for analysis:
cds ingest files s3://cds-test-vp-905418373856/msrvtt-videos/ \
--collection-id a9fab0958_1079_412e_b7b8_d863fcecccae \
--extensions mp4 \
--limit 10 \
--num-workers 3 \
--s3-profile cds-s3-aws \
--output-log ~/ingestion-results.csv
The console output will report where the log file is being written:
๐ Logging responses in /home/user/ingestion-results.csv
The following is an example of the CSV file contents:
file,status
s3://cds-test-vp-905418373856/msrvtt-videos/video7020.mp4,200
s3://cds-test-vp-905418373856/msrvtt-videos/video7024.mp4,200
s3://cds-test-vp-905418373856/msrvtt-videos/video7021.mp4,200
...
Each row contains the file path and HTTP status code (a 200 value indicates success).
CLI Command Reference#
Quick Command Summary#
# Configuration
cds config set [--profile PROFILE]
# Pipelines
cds pipelines list [--verbose] [--profile PROFILE]
# Collections
cds collections create --pipeline PIPELINE --name NAME [options]
cds collections list [--profile PROFILE]
cds collections get COLLECTION_ID [--profile PROFILE]
cds collections delete COLLECTION_ID [--profile PROFILE]
# Ingestion
cds ingest files S3_PATH --collection-id ID --extensions mp4 [options]
cds ingest embeddings --parquet-dataset S3_PATH --collection-id ID --id-cols COLS [options]
# Search
cds search --collection-ids ID --text-query "TEXT" --top-k K [options]
# Secrets (use kubectl directly)
docker exec cds-deployment kubectl create secret generic NAME --from-literal=key=value
docker exec cds-deployment kubectl get secrets
docker exec cds-deployment kubectl delete secret NAME
Troubleshooting#
Common CLI Issues#
Issue: Profile 'xyz' is not available
Solution: Configure the profile first:
cds config set --profile xyz
Issue: Collection not found
Solution: List the collections to verify the ID:
cds collections list
Issue: S3 access denied
Solution: Check your S3 credentials or profile configuration:
aws s3 ls s3://your-bucket/ # Test AWS credentials work
Issue: Status code 500 during ingestion
Solution: Check the following:
The video format is supported (MP4 recommended).
The videos are accessible from S3.
Check the CDS service logs:
docker exec kubectl logs deployment/visual-search
Getting Help#
Run cds --help for command overview or cds <command> --help for detailed command information:
cds --help
cds collections create --help
cds ingest files --help