API Reference#

This page describes the API endpoints and provides example API calls for NeMo Curator on DGX Cloud.

Prerequisites#

You will need the following to interact with the API endpoints:

A valid NGC API key for your NeMo Curator on DGX Cloud account
The curl command-line tool
The jq tool (for JSON parsing)
The base64 command-line tool (required for S3 Input/Output processing)
A dataset ZIP file or access to S3 buckets
For S3 processing: Properly configured AWS credentials
For the Example Worflows: An OS that supports BASH scripts, such as Linux or macOS

Example Workflows#

There are two ways to create a dataset for NeMo Curator on DGX Cloud: By uploading a ZIP file as a multipart upload, or by linking to S3 input and output buckets for direct processing. The following example BASH scripts cover these workflows.

Uploading a ZIP File#

#!/bin/bash

# Set your variables
export COSMOS_KEY='your-api-key'
ZIP_FILE="your-dataset.zip"
NUM_PARTS=5
DATASET_NAME="My Test Dataset"
DATASET_DESCRIPTION="This is a test dataset for COSMOS"

# Create a new dataset
DATASET_RESPONSE=$(curl -s -w "\n%{http_code}" "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"name\": \"${DATASET_NAME}\", \"description\": \"${DATASET_DESCRIPTION}\", \"jobSpec\": {
      \"pipeline\": \"split\",
      \"args\": {
         \"generate_embeddings\": true,
         \"generate_previews\": true,
         \"generate_captions\": true,
         \"splitting_algorithm\": \"transnetv2\",
         \"captioning_prompt_variant\": \"default\",
         \"captioning_prompt_text\": \"actual prompt\"
      }
   }
}")

HTTP_STATUS=$(echo "$DATASET_RESPONSE" | tail -n 1)
DATASET_JSON=$(echo "$DATASET_RESPONSE" | sed '$d')
DATASET_ID=$(echo "$DATASET_JSON" | jq -r '.id')

# Initialize upload
INIT_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/initialize" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d '{}')

INIT_JSON=$(echo "$INIT_RESPONSE" | sed '$d')
FILE_ID=$(echo "$INIT_JSON" | jq -r '.fileId')
FILE_KEY=$(echo "$INIT_JSON" | jq -r '.fileKey')

# Get presigned URLs for upload
URLS_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/getPreSignedUrls" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": ${NUM_PARTS}, \"expires\": 3600}")

URLS_JSON=$(echo "$URLS_RESPONSE" | sed '$d')

# Split file and upload parts
split -n "${NUM_PARTS}" "${ZIP_FILE}" "${ZIP_FILE}.part_"

# Upload parts and collect ETags
declare -a ETAGS
declare -a PART_NUMBERS
PART_NUM=1
for part in "${ZIP_FILE}".part_*; do
   echo "Uploading part ${PART_NUM}..."
   SIGNED_URL=$(echo "$URLS_JSON" | jq -r ".parts[] | select(.PartNumber==${PART_NUM}) | .signedUrl")
   UPLOAD_RESPONSE=$(curl -s -v -X PUT -T "${part}" \
      -H "Content-Type: application/zip" \
      "${SIGNED_URL}" 2>&1)
   ETAG=$(echo "$UPLOAD_RESPONSE" | grep -i "< etag:" | sed 's/< etag: //I' | tr -d '"' | tr -d '\r')
   if [ -n "$ETAG" ]; then
      ETAGS+=("$ETAG")
      PART_NUMBERS+=("$PART_NUM")
      echo "Part ${PART_NUM} uploaded with ETag: ${ETAG}"
   else
      echo "Error: Failed to get ETag for part ${PART_NUM}"
      exit 1
   fi
   PART_NUM=$((PART_NUM + 1))
done

# Check if all parts were uploaded and ETags collected
if [ ${#ETAGS[@]} -ne ${NUM_PARTS} ]; then
   echo "Error: Not all parts were uploaded or ETags collected."
   exit 1
fi

# Construct parts JSON for finalization
PARTS_JSON="["
for i in "${!ETAGS[@]}"; do
   if [ "$i" -gt 0 ]; then
      PARTS_JSON="${PARTS_JSON},"
   fi
   PARTS_JSON="${PARTS_JSON}{\"ETag\":\"${ETAGS[$i]}\",\"PartNumber\":${PART_NUMBERS[$i]}}"
done
PARTS_JSON="${PARTS_JSON}]"

# Finalize upload
FINALIZE_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/finalize" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": ${PARTS_JSON}}")

# Get download URL
DOWNLOAD_RESPONSE=$(curl -s -w "\n%{http_code}" -X GET "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/download/getPreSignedUrls?processed=false" \
-H "Authorization: Bearer ${COSMOS_KEY}")

DOWNLOAD_JSON=$(echo "$DOWNLOAD_RESPONSE" | sed '$d')
PRESIGNED_URL=$(echo "$DOWNLOAD_JSON" | jq -r '.url')

# Process dataset
PROCESS_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/process" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"url\": \"${PRESIGNED_URL}\"}")

Linking S3 Input/Output Buckets#

Important

When using an AWS Access Key, ensure you provide only the minimum S3 permissions required for NeMo Curator on DGX Cloud operations. The curator service should have read-only permissions for the input data bucket and read/write permissions for the output data bucket. The AWS Access Key should not provide read/write permissions to any buckets except those associated with dataset input/output operations.

#!/bin/bash

# Set your variables
export COSMOS_KEY='your-api-key'

# Check if AWS credentials file exists
if [ ! -f ~/.aws/credentials ]; then
   echo "Error: AWS credentials file not found at ~/.aws/credentials"
   echo "Please make sure your AWS credentials are properly configured"
   exit 1
fi

# Base64 encode the AWS credentials file for s3Config
S3_CONFIG=$(base64 -w 0 ~/.aws/credentials)

# Create dataset with S3 input and output
S3_DATASET_RESPONSE=$(curl -s -w "\n%{http_code}" "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{
   \"name\": \"s3-test-dataset\",
   \"description\": \"S3 input/output test\",
   \"s3InputPrefix\": \"s3://your-input-bucket/path/\",
   \"s3OutputPrefix\": \"s3://your-output-bucket/path/\",
   \"s3Config\": \"${S3_CONFIG}\",
   \"jobSpec\": {
      \"pipeline\": \"split\",
      \"args\": {
      \"generate_embeddings\": true,
      \"generate_previews\": true,
      \"generate_captions\": true,
      \"splitting_algorithm\": \"transnetv2\",
      \"captioning_prompt_variant\": \"default\",
      \"captioning_prompt_text\": \"your caption prompt\"
      }
   }
}")

HTTP_STATUS=$(echo "$S3_DATASET_RESPONSE" | tail -n 1)
S3_DATASET_JSON=$(echo "$S3_DATASET_RESPONSE" | sed '$d')
S3_DATASET_ID=$(echo "$S3_DATASET_JSON" | jq -r '.id')

echo "S3 dataset created with ID: ${S3_DATASET_ID}"
echo "S3 processing will happen asynchronously."

API Endpoints#

Create a Dataset#

POST /v1/cosmos/datasets

Creates a dataset and returns a dataset ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object containing curation parameters.

Response Body

The response body is a JSON object containing the id parameter specifying the ID of the new dataset. If the returned id is empty or a null value, then dataset creation failed.

Example

curl -s "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
 -H "Authorization: Bearer ${COSMOS_KEY}" \
 -H "Content-Type: application/json" \
 -d '{"name": "My Dataset", "description": "Test dataset", "jobSpec": {
    "pipeline": "split",
    "args": {
       "generate_embeddings": true,
       "generate_previews": true,
       "generate_captions": true,
       "splitting_algorithm": "transnetv2",
       "captioning_prompt_variant": "default",
       "captioning_prompt_text": "actual prompt"
    }
 }}'

Get a Dataset#

GET /v1/cosmos/datasets/{dataset_id}

Retrieves the details of the dataset with the specified ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Response Body

The response body is a JSON object containing dataset details.

Example

{
"name": "test dataset",
"userDescription": "my dataset",
"humanAttributesEnabled": false,
"org": "0461422519424422",
"ncaId": "Rz8I0e_JP2ptQU0rFDVP9ZfqPjhhRHhEELcNSj2i1yE",
"team": "no_team",
"owner": "33nxR5rY_tl2FWDNSh7Tok5MeB9NzutZhFS9Tfyi-er",
"type": "ZipFile",
"url": "",
"inputS3Prefix": "",
"outputS3Prefix": "",
"s3Config": "",
"videoType": "",
"videoPrompt": "",
"jobSpec": {
   "pipeline": "split",
   "args": {
      "generate_embeddings": true,
      "generate_previews": true,
      "generate_captions": true,
      "splitting_algorithm": "transnetv2",
      "captioning_prompt_variant": "default",
      "captioning_prompt_text": "actual prompt"
   }
},
"dateCreated": "2025-03-13T23:52:43.195Z",
"lastModifiedTimestamp": "2025-03-13T23:52:43.195Z",
"lastStatus": {
   "status": "CREATING",
   "message": "Waiting for user to upload files and proceed to processing",
   "details": "0%"
},
"user": {
   "email": "example_user@nvidia.com",
   "name": "example_user"
},
"id": "168166c7-6e32-4d3a-b2ea-c748b1f3cbf2"
}

Get All Datasets by Organization#

GET /v1/cosmos/datasets?filter=by:org

Retrieves the details of the dataset with the specified ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Request Body

None

Response Body

The response body is a JSON object containing details for each dataset.

Initialize Dataset Upload#

POST /v1/cosmos/datasets/{dataset_id}/upload/initialize

Initializes uploading a file to the dataset with the specified ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

None

Request Body

The response body will be a JSON object containing the “fileId” and “fileKey” to use when uploading the file.

Example

curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/initialize" \
  -H "Authorization: Bearer ${COSMOS_KEY}" \
  -H "Content-Type: application/json" \
  -d '{}'

Get Presigned URLs#

POST /v1/cosmos/datasets/{dataset_id}/upload/getPreSignedUrls

Retrieves an array of presigned URLs for uploading the dataset ZIP file as a multipart upload.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object with the following parameters:

fileId (string): The ID of the file returned during the initialization step
fileKey (string): The key of the file returned during the initialization step
parts (int): The number of parts to upload
expires (int): The expiration time (in seconds) for the presigned URLs

Response Body

The response body is a JSON object containing an array of presigned URLs for uploading the dataset ZIP file.

Example

curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/getPreSignedUrls" \
  -H "Authorization: Bearer ${COSMOS_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": 5, \"expires\": 3600}"

Finalize Dataset Upload#

POST /v1/cosmos/datasets/{dataset_id}/upload/finalize

Completes a multipart upload by providing ETags (Entity Tags) for all uploaded parts.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object with the following parameters:

fileId (string): The ID of the file returned during the initialization step
fileKey (string): The key of the file returned during the initialization step
parts (list): A list of parts that have been uploaded. Each part should be an object with the following parameters: * partNumber (int): The part number * ETag (string): The ETag for the part

Response Body

The response body is a JSON confirmation object.

Example

curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/finalize" \
 -H "Authorization: Bearer ${COSMOS_KEY}" \
 -H "Content-Type: application/json" \
 -d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": [{\"ETag\":\"\\\"abc123\\\"\",\"PartNumber\":1}]}"

Get Dataset Download URL#

GET /v1/cosmos/datasets/{dataset_id}/download/getPreSignedUrls

Retrieves a presigned URL for downloading the dataset ZIP file.

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Request Body

None

Response Body

The response body is a JSON object containing the presigned URL for downloading the dataset ZIP file.

Example

curl -s -X GET "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/download/getPreSignedUrls?processed=false" \
 -H "Authorization: Bearer ${COSMOS_KEY}"

Process Dataset#

POST /v1/cosmos/datasets/{dataset_id}/process

Processes the dataset with the specified ID. Once this call is received, the server will begin generating captions for the video files and process the results for storage/retrieval.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object with the “url” parameter, which is a string specifying the presigned download URL of the dataset (refer to the Get Dataset Download URL specification for more details).

Response Body

The response body is a JSON confirmation object.

Example

curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/process" \
 -H "Authorization: Bearer ${COSMOS_KEY}" \
 -H "Content-Type: application/json" \
 -d "{\"url\": \"${PRESIGNED_URL}\"}"

Delete Dataset#

DELETE /v1/cosmos/datasets/{dataset_id}

Deletes the dataset with the specified ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Request Body

None

Response Body

The response body is a JSON confirmation object.

curl -s -X DELETE "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}" \
  -H "Authorization: Bearer ${COSMOS_KEY}"

Process S3 Input/Output#

POST /v1/cosmos/datasets/datasets

Creates a dataset that processes data directly from–and writes data directly to–S3 buckets.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object with the following parameters:

{
"name": "Your Dataset Name",
"description": "Your dataset description",
"s3InputPrefix": "s3://your-input-bucket/path/",
"s3OutputPrefix": "s3://your-output-bucket/path/",
"s3Config": "BASE64_ENCODED_AWS_CREDENTIALS",
"jobSpec": {
   "pipeline": "split",
   "args": {
      "generate_embeddings": true,
      "generate_previews": true,
      "generate_captions": true,
      "splitting_algorithm": "transnetv2",
      "captioning_prompt_variant": "default",
      "captioning_prompt_text": "your prompt text"
   }
}
}

Refer to the Curation Parameters page for a description of all available “jobSpec” parameters.

Response Body

The response body is a JSON object containing the dataset details, including the id parameter specifying the ID of the new dataset.

Example

# Base64 encode AWS credentials
S3_CONFIG=$(base64 -w 0 ~/.aws/credentials)

curl -s "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{
   \"name\": \"s3-test-dataset\",
   \"description\": \"S3 input/output test\",
   \"s3InputPrefix\": \"s3://pre-signed-test-curator/dtzeng/testinput1/\",
   \"s3OutputPrefix\": \"s3://pre-signed-test-curator/dtzeng/dtbucket15/\",
   \"s3Config\": \"${S3_CONFIG}\",
   \"jobSpec\": {
      \"pipeline\": \"split\",
      \"args\": {
      \"generate_embeddings\": true,
      \"generate_previews\": true,
      \"generate_captions\": true,
      \"splitting_algorithm\": \"transnetv2\",
      \"captioning_prompt_variant\": \"default\",
      \"captioning_prompt_text\": \"actual prompt\"
      }
   }
}"

Get Dataset Captions#

GET /v1/cosmos/datasets/{dataset_id}/captions

Retrieves the text captions for each video in the dataset with the specified ID.

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Request Body

None

Response Body

The response body is a JSON object containing captions for videos as a list of key/value pairs. Each key corresponds to the file_id of a video file, with the value corresponding to the captions for the video.

Update Dataset Captions#

PATCH /v1/cosmos/datasets/{dataset_id}/captions/{caption_id}

Updates the text of the specified caption.

Request Headers

Authorization: Bearer ${COSMOS_KEY}
Content-Type: application/json

Request Body

The request body is a JSON object containing a single “caption” parameter, which is a string containing the updated caption.

Response Body

The response body is a JSON confirmation object.

Example

curl -X PATCH "https://api.ngc.nvidia.com/v1/cosmos/datasets/50b48bee-ecb1-4a22-afe9-ae90bd4864ae/captions/0" \
  -H "Authorization: Bearer ${COSMOS_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
     "caption": "The video begins with a black car driving down a road, surrounded by trees and buildings. The car is moving at a moderate speed, and the camera captures its movements from a distance. As the car continues to drive, it passes by several other vehicles, including a white truck and a blue car. The road is well-paved, and there are no visible pedestrians or cyclists in the scene.\nAs the car approaches an intersection, the camera zooms in on the vehicle, providing a closer view of its features. The car has a sleek design, with a black exterior and green accents on the doors and hood. The front grille is prominent, and the headlights are sharp and angular. The car also has a roof-mounted sensor array, indicating that it may be equipped for autonomous driving or other advanced technologies.\nOverall, the video provides a detailed look at the car\'s design and features as it moves through an urban environment. The camera work is smooth and steady, capturing the car\'s movements and surroundings effectively."
     }'

Terminate All Jobs#

Terminates all in-progress jobs for Organization.

DELETE /v1/cosmos/jobs

Request Headers

Authorization: Bearer ${COSMOS_KEY}

Request Body

None

Response Body

The response body is a JSON confirmation object.

Example

curl -X DELETE "https://api.ngc.nvidia.com/v1/cosmos/jobs" \
  -H "Authorization: Bearer ${COSMOS_KEY}"