API Reference#

This page describes the API endpoints and provides example API calls for Cosmos Curator on DGX Cloud. Note: This API documentation is primarily for Linux and macOS environments.

Prerequisites#

You will need the following to interact with the API endpoints:

A valid NGC API key for your Cosmos Curator on DGX Cloud account
The curl command-line tool
The jq tool (for JSON parsing)
The base64 command-line tool (required for S3 Input/Output processing)
A dataset ZIP file or access to S3 buckets
For S3 processing: Properly configured AWS credentials
For the Example Worflows: An OS that supports BASH scripts, such as Linux or macOS

Table of Contents#

API Endpoints #

Create a Dataset #

POST /v1/cosmos/datasets

Creates a dataset and returns a dataset ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object containing curation parameters.

Response Body

The response body is a JSON object containing the id parameter specifying the ID of the new dataset. If the returned id is empty or a null value, then dataset creation failed.

Example

curl -s "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
 -H "Authorization: Bearer ${COSMOS_KEY}" \
 -H "Content-Type: application/json" \
 -d '{"name": "My Dataset", "description": "Test dataset", "jobSpec": {
    "pipeline": "split",
    "args": {
       "generate_embeddings": true,
       "generate_previews": true,
       "generate_captions": true,
       "splitting_algorithm": "transnetv2",
       "captioning_prompt_variant": "default",
       "captioning_prompt_text": "actual prompt"
    }
 }}'

Get a Dataset #

GET /v1/cosmos/datasets/{dataset_id}

Retrieves the details of the dataset with the specified ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Response Body

The response body is a JSON object containing dataset details.

Example

{
"name": "test dataset",
"userDescription": "my dataset",
"humanAttributesEnabled": false,
"org": "0461422519424422",
"ncaId": "Rz8I0e_JP2ptQU0rFDVP9ZfqPjhhRHhEELcNSj2i1yE",
"team": "no_team",
"owner": "33nxR5rY_tl2FWDNSh7Tok5MeB9NzutZhFS9Tfyi-er",
"type": "ZipFile",
"url": "",
"inputS3Prefix": "",
"outputS3Prefix": "",
"s3Config": "",
"videoType": "",
"videoPrompt": "",
"jobSpec": {
   "pipeline": "split",
   "args": {
      "generate_embeddings": true,
      "generate_previews": true,
      "generate_captions": true,
      "splitting_algorithm": "transnetv2",
      "captioning_prompt_variant": "default",
      "captioning_prompt_text": "actual prompt"
   }
},
"dateCreated": "2025-03-13T23:52:43.195Z",
"lastModifiedTimestamp": "2025-03-13T23:52:43.195Z",
"lastStatus": {
   "status": "CREATING",
   "message": "Waiting for user to upload files and proceed to processing",
   "details": "0%"
},
"user": {
   "email": "example_user@nvidia.com",
   "name": "example_user"
},
"id": "168166c7-6e32-4d3a-b2ea-c748b1f3cbf2"
}

Get All Datasets by Organization #

GET /v1/cosmos/datasets?filter=by:org

Retrieves the details of the dataset with the specified ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Request Body

None

Response Body

The response body is a JSON object containing details for each dataset.

Initialize Dataset Upload #

POST /v1/cosmos/datasets/{dataset_id}/upload/initialize

Initializes uploading a file to the dataset with the specified ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

None

Request Body

The response body will be a JSON object containing the “fileId” and “fileKey” to use when uploading the file.

Example

curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/initialize" \
  -H "Authorization: Bearer ${COSMOS_KEY}" \
  -H "Content-Type: application/json" \
  -d '{}'

Get Presigned URLs #

POST /v1/cosmos/datasets/{dataset_id}/upload/getPreSignedUrls

Retrieves an array of presigned URLs for uploading the dataset ZIP file as a multipart upload.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object with the following parameters:

fileId (string): The ID of the file returned during the initialization step
fileKey (string): The key of the file returned during the initialization step
parts (int): The number of parts to upload
expires (int): The expiration time (in seconds) for the presigned URLs

Response Body

The response body is a JSON object containing an array of presigned URLs for uploading the dataset ZIP file.

Example

curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/getPreSignedUrls" \
  -H "Authorization: Bearer ${COSMOS_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": 5, \"expires\": 3600}"

Finalize Dataset Upload #

POST /v1/cosmos/datasets/{dataset_id}/upload/finalize

Completes a multipart upload by providing ETags (Entity Tags) for all uploaded parts.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object with the following parameters:

fileId (string): The ID of the file returned during the initialization step
fileKey (string): The key of the file returned during the initialization step
parts (list): A list of parts that have been uploaded. Each part should be an object with the following parameters: * partNumber (int): The part number * ETag (string): The ETag for the part

Response Body

The response body is a JSON confirmation object.

Example

curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/finalize" \
 -H "Authorization: Bearer ${COSMOS_KEY}" \
 -H "Content-Type: application/json" \
 -d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": [{\"ETag\":\"\\\"abc123\\\"\",\"PartNumber\":1}]}"

Get Dataset Download URL #

GET /v1/cosmos/datasets/{dataset_id}/download/getPreSignedUrls

Retrieves a presigned URL for downloading the dataset ZIP file.

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Request Body

None

Response Body

The response body is a JSON object containing the presigned URL for downloading the dataset ZIP file.

Example

curl -s -X GET "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/download/getPreSignedUrls?processed=false" \
 -H "Authorization: Bearer ${COSMOS_KEY}"

Process Dataset #

POST /v1/cosmos/datasets/{dataset_id}/process

Processes the dataset with the specified ID. Once this call is received, the server will begin generating captions for the video files and process the results for storage/retrieval.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object with the “url” parameter, which is a string specifying the presigned download URL of the dataset (refer to the Get Dataset Download URL specification for more details).

Response Body

The response body is a JSON confirmation object.

Example

curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/process" \
 -H "Authorization: Bearer ${COSMOS_KEY}" \
 -H "Content-Type: application/json" \
 -d "{\"url\": \"${PRESIGNED_URL}\"}"

Delete Dataset #

DELETE /v1/cosmos/datasets/{dataset_id}

Deletes the dataset with the specified ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Request Body

None

Response Body

The response body is a JSON confirmation object.

curl -s -X DELETE "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}" \
  -H "Authorization: Bearer ${COSMOS_KEY}"

Process S3 Input/Output #

POST /v1/cosmos/datasets/datasets

Creates a dataset that processes data directly from–and writes data directly to–S3 buckets.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object with the following parameters:

{
"name": "Your Dataset Name",
"description": "Your dataset description",
"s3InputPrefix": "s3://your-input-bucket/path/",
"s3OutputPrefix": "s3://your-output-bucket/path/",
"s3Config": "BASE64_ENCODED_AWS_CREDENTIALS",
"jobSpec": {
   "pipeline": "split",
   "args": {
      "generate_embeddings": true,
      "generate_previews": true,
      "generate_captions": true,
      "splitting_algorithm": "transnetv2",
      "captioning_prompt_variant": "default",
      "captioning_prompt_text": "your prompt text"
   }
}
}

Refer to the Curation Parameters page for a description of all available “jobSpec” parameters.

Response Body

The response body is a JSON object containing the dataset details, including the id parameter specifying the ID of the new dataset.

Example

# Base64 encode AWS credentials
S3_CONFIG=$(base64 -w 0 ~/.aws/credentials)

curl -s "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{
   \"name\": \"s3-test-dataset\",
   \"description\": \"S3 input/output test\",
   \"s3InputPrefix\": \"s3://pre-signed-test-curator/dtzeng/testinput1/\",
   \"s3OutputPrefix\": \"s3://pre-signed-test-curator/dtzeng/dtbucket15/\",
   \"s3Config\": \"${S3_CONFIG}\",
   \"jobSpec\": {
      \"pipeline\": \"split\",
      \"args\": {
      \"generate_embeddings\": true,
      \"generate_previews\": true,
      \"generate_captions\": true,
      \"splitting_algorithm\": \"transnetv2\",
      \"captioning_prompt_variant\": \"default\",
      \"captioning_prompt_text\": \"actual prompt\"
      }
   }
}"

Get Dataset Captions #

GET /v1/cosmos/datasets/{dataset_id}/captions

Retrieves the text captions for each video in the dataset with the specified ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Request Body

None

Response Body

The response body is a JSON object containing captions for videos as a list of key/value pairs. Each key corresponds to the file_id of a video file, with the value corresponding to the captions for the video.

Update Dataset Captions #

PATCH /v1/cosmos/datasets/{dataset_id}/captions/{caption_id}

Updates the text of the specified caption.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object containing a single “caption” parameter, which is a string containing the updated caption.

Response Body

The response body is a JSON confirmation object.

Example

curl -X PATCH "https://api.ngc.nvidia.com/v1/cosmos/datasets/50b48bee-ecb1-4a22-afe9-ae90bd4864ae/captions/0" \
  -H "Authorization: Bearer ${COSMOS_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
     "caption": "The video begins with a black car driving down a road, surrounded by trees and buildings. The car is moving at a moderate speed, and the camera captures its movements from a distance. As the car continues to drive, it passes by several other vehicles, including a white truck and a blue car. The road is well-paved, and there are no visible pedestrians or cyclists in the scene.\nAs the car approaches an intersection, the camera zooms in on the vehicle, providing a closer view of its features. The car has a sleek design, with a black exterior and green accents on the doors and hood. The front grille is prominent, and the headlights are sharp and angular. The car also has a roof-mounted sensor array, indicating that it may be equipped for autonomous driving or other advanced technologies.\nOverall, the video provides a detailed look at the car\'s design and features as it moves through an urban environment. The camera work is smooth and steady, capturing the car\'s movements and surroundings effectively."
     }'

Terminate All Jobs #

Terminates all in-progress jobs for Organization.

DELETE /v1/cosmos/jobs

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Request Body

None

Response Body

The response body is a JSON confirmation object.

Example

curl -X DELETE "https://api.ngc.nvidia.com/v1/cosmos/jobs" \
  -H "Authorization: Bearer ${COSMOS_KEY}"

Example Workflows #

There are two ways to create a dataset for Cosmos Curator on DGX Cloud: By uploading a ZIP file as a multipart upload, or by linking to S3 input and output buckets for direct processing. The following example BASH scripts cover these workflows.

Uploading a ZIP File #

#!/bin/bash

# Set your variables
export COSMOS_KEY='your-api-key'
ZIP_FILE="your-dataset.zip"
NUM_PARTS=5
DATASET_NAME="My Test Dataset"
DATASET_DESCRIPTION="This is a test dataset for COSMOS"

# Create a new dataset
DATASET_RESPONSE=$(curl -s -w "\n%{http_code}" "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"name\": \"${DATASET_NAME}\", \"description\": \"${DATASET_DESCRIPTION}\", \"jobSpec\": {
      \"pipeline\": \"split\",
      \"args\": {
         \"generate_embeddings\": true,
         \"generate_previews\": true,
         \"generate_captions\": true,
         \"splitting_algorithm\": \"transnetv2\",
         \"captioning_prompt_variant\": \"default\",
         \"captioning_prompt_text\": \"actual prompt\"
      }
   }
}")

HTTP_STATUS=$(echo "$DATASET_RESPONSE" | tail -n 1)
DATASET_JSON=$(echo "$DATASET_RESPONSE" | sed '$d')
DATASET_ID=$(echo "$DATASET_JSON" | jq -r '.id')

# Initialize upload
INIT_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/initialize" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d '{}')

INIT_JSON=$(echo "$INIT_RESPONSE" | sed '$d')
FILE_ID=$(echo "$INIT_JSON" | jq -r '.fileId')
FILE_KEY=$(echo "$INIT_JSON" | jq -r '.fileKey')

# Get presigned URLs for upload
URLS_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/getPreSignedUrls" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": ${NUM_PARTS}, \"expires\": 3600}")

URLS_JSON=$(echo "$URLS_RESPONSE" | sed '$d')

# Split file and upload parts
split -n "${NUM_PARTS}" "${ZIP_FILE}" "${ZIP_FILE}.part_"

# Upload parts and collect ETags
declare -a ETAGS
declare -a PART_NUMBERS
PART_NUM=1
for part in "${ZIP_FILE}".part_*; do
   echo "Uploading part ${PART_NUM}..."
   SIGNED_URL=$(echo "$URLS_JSON" | jq -r ".parts[] | select(.PartNumber==${PART_NUM}) | .signedUrl")
   UPLOAD_RESPONSE=$(curl -s -v -X PUT -T "${part}" \
      -H "Content-Type: application/zip" \
      "${SIGNED_URL}" 2>&1)
   ETAG=$(echo "$UPLOAD_RESPONSE" | grep -i "< etag:" | sed 's/< etag: //I' | tr -d '"' | tr -d '\r')
   if [ -n "$ETAG" ]; then
      ETAGS+=("$ETAG")
      PART_NUMBERS+=("$PART_NUM")
      echo "Part ${PART_NUM} uploaded with ETag: ${ETAG}"
   else
      echo "Error: Failed to get ETag for part ${PART_NUM}"
      exit 1
   fi
   PART_NUM=$((PART_NUM + 1))
done

# Check if all parts were uploaded and ETags collected
if [ ${#ETAGS[@]} -ne ${NUM_PARTS} ]; then
   echo "Error: Not all parts were uploaded or ETags collected."
   exit 1
fi

# Construct parts JSON for finalization
PARTS_JSON="["
for i in "${!ETAGS[@]}"; do
   if [ "$i" -gt 0 ]; then
      PARTS_JSON="${PARTS_JSON},"
   fi
   PARTS_JSON="${PARTS_JSON}{\"ETag\":\"${ETAGS[$i]}\",\"PartNumber\":${PART_NUMBERS[$i]}}"
done
PARTS_JSON="${PARTS_JSON}]"

# Finalize upload
FINALIZE_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/finalize" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": ${PARTS_JSON}}")

# Get download URL
DOWNLOAD_RESPONSE=$(curl -s -w "\n%{http_code}" -X GET "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/download/getPreSignedUrls?processed=false" \
-H "Authorization: Bearer ${COSMOS_KEY}")

DOWNLOAD_JSON=$(echo "$DOWNLOAD_RESPONSE" | sed '$d')
PRESIGNED_URL=$(echo "$DOWNLOAD_JSON" | jq -r '.url')

# Process dataset
PROCESS_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/process" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"url\": \"${PRESIGNED_URL}\"}")

Linking S3 Input/Output Buckets #

Important

When using an AWS Access Key, ensure you provide only the minimum S3 permissions required for Cosmos Curator on DGX Cloud operations. The curator service should have read-only permissions for the input data bucket and read/write permissions for the output data bucket. The AWS Access Key should not provide read/write permissions to any buckets except those associated with dataset input/output operations.

#!/bin/bash

# Set your variables
export COSMOS_KEY='your-api-key'

# Check if AWS credentials file exists
if [ ! -f ~/.aws/credentials ]; then
   echo "Error: AWS credentials file not found at ~/.aws/credentials"
   echo "Please make sure your AWS credentials are properly configured"
   exit 1
fi

# Base64 encode the AWS credentials file for s3Config
S3_CONFIG=$(base64 -w 0 ~/.aws/credentials)

# Create dataset with S3 input and output
S3_DATASET_RESPONSE=$(curl -s -w "\n%{http_code}" "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{
   \"name\": \"s3-test-dataset\",
   \"description\": \"S3 input/output test\",
   \"s3InputPrefix\": \"s3://your-input-bucket/path/\",
   \"s3OutputPrefix\": \"s3://your-output-bucket/path/\",
   \"s3Config\": \"${S3_CONFIG}\",
   \"jobSpec\": {
      \"pipeline\": \"split\",
      \"args\": {
      \"generate_embeddings\": true,
      \"generate_previews\": true,
      \"generate_captions\": true,
      \"splitting_algorithm\": \"transnetv2\",
      \"captioning_prompt_variant\": \"default\",
      \"captioning_prompt_text\": \"your caption prompt\"
      }
   }
}")

HTTP_STATUS=$(echo "$S3_DATASET_RESPONSE" | tail -n 1)
S3_DATASET_JSON=$(echo "$S3_DATASET_RESPONSE" | sed '$d')
S3_DATASET_ID=$(echo "$S3_DATASET_JSON" | jq -r '.id')

echo "S3 dataset created with ID: ${S3_DATASET_ID}"
echo "S3 processing will happen asynchronously."

Troubleshoting #

Common issues and their solutions:#

Authentication errors (HTTP 401): Ensure your COSMOS_KEY is valid and correctly formatted.
Failed to get ETag: When uploading parts, ensure you’re correctly capturing the ETag from the response headers after each part upload. The ETag is returned in the etag header of the PUT request response. Inspect the response headers to extract the ETag. The provided Upload ZIP File Workflow script demonstrates how to capture the ETag using curl and grep.
Processing taking too long: Processing large datasets may take considerable time. Consider implementing a polling mechanism to check processing status.
Invalid dataset format: Ensure your dataset follows COSMOS guidelines for format and structure.
Timeout errors: For large files, consider increasing the expires parameter when getting presigned URLs.
S3 access issues: For S3 input/output processing: - Ensure your AWS credentials in ~/.aws/credentials have proper permissions for the specified S3 buckets - Verify that your S3 bucket paths are correctly formatted - Make sure the base64 encoding of your credentials is done correctly (use base64 -w 0 to prevent line breaks)
Job specification errors: Ensure the jobSpec parameters match the expected format for the pipeline you’re using. Different pipelines may require different arguments.
Environment issues: Double-check that you are using the correct API endpoint URL (Production, Staging, or Canary) for your intended environment. Mismatched environments can lead to unexpected errors.
AWS Credentials Issues: When using S3 as the dataset input/output source, You must ensure that aws_access_key_id, aws_access_secret_key, and region are passed and must be used as the [default] profile (unless you’re using the input_s3_profile_name/output_s3_profile_name arguments in your jobSpec).

For additional support, contact NVIDIA Developer Support or refer to the official COSMOS documentation.