API Reference#
This page describes the API endpoints and provides example API calls for NeMo Curator on DGX Cloud.
Prerequisites#
You will need the following to interact with the API endpoints:
A valid NGC API key for your NeMo Curator on DGX Cloud account
The curl command-line tool
The jq tool (for JSON parsing)
The base64 command-line tool (required for S3 Input/Output processing)
A dataset ZIP file or access to S3 buckets
For S3 processing: Properly configured AWS credentials
For the Example Worflows: An OS that supports BASH scripts, such as Linux or macOS
Example Workflows#
There are two ways to create a dataset for NeMo Curator on DGX Cloud: By uploading a ZIP file as a multipart upload, or by linking to S3 input and output buckets for direct processing. The following example BASH scripts cover these workflows.
Uploading a ZIP File#
#!/bin/bash
# Set your variables
export COSMOS_KEY='your-api-key'
ZIP_FILE="your-dataset.zip"
NUM_PARTS=5
DATASET_NAME="My Test Dataset"
DATASET_DESCRIPTION="This is a test dataset for COSMOS"
# Create a new dataset
DATASET_RESPONSE=$(curl -s -w "\n%{http_code}" "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"name\": \"${DATASET_NAME}\", \"description\": \"${DATASET_DESCRIPTION}\", \"jobSpec\": {
\"pipeline\": \"split\",
\"args\": {
\"generate_embeddings\": true,
\"generate_previews\": true,
\"generate_captions\": true,
\"splitting_algorithm\": \"transnetv2\",
\"captioning_prompt_variant\": \"default\",
\"captioning_prompt_text\": \"actual prompt\"
}
}
}")
HTTP_STATUS=$(echo "$DATASET_RESPONSE" | tail -n 1)
DATASET_JSON=$(echo "$DATASET_RESPONSE" | sed '$d')
DATASET_ID=$(echo "$DATASET_JSON" | jq -r '.id')
# Initialize upload
INIT_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/initialize" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d '{}')
INIT_JSON=$(echo "$INIT_RESPONSE" | sed '$d')
FILE_ID=$(echo "$INIT_JSON" | jq -r '.fileId')
FILE_KEY=$(echo "$INIT_JSON" | jq -r '.fileKey')
# Get presigned URLs for upload
URLS_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/getPreSignedUrls" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": ${NUM_PARTS}, \"expires\": 3600}")
URLS_JSON=$(echo "$URLS_RESPONSE" | sed '$d')
# Split file and upload parts
split -n "${NUM_PARTS}" "${ZIP_FILE}" "${ZIP_FILE}.part_"
# Upload parts and collect ETags
declare -a ETAGS
declare -a PART_NUMBERS
PART_NUM=1
for part in "${ZIP_FILE}".part_*; do
echo "Uploading part ${PART_NUM}..."
SIGNED_URL=$(echo "$URLS_JSON" | jq -r ".parts[] | select(.PartNumber==${PART_NUM}) | .signedUrl")
UPLOAD_RESPONSE=$(curl -s -v -X PUT -T "${part}" \
-H "Content-Type: application/zip" \
"${SIGNED_URL}" 2>&1)
ETAG=$(echo "$UPLOAD_RESPONSE" | grep -i "< etag:" | sed 's/< etag: //I' | tr -d '"' | tr -d '\r')
if [ -n "$ETAG" ]; then
ETAGS+=("$ETAG")
PART_NUMBERS+=("$PART_NUM")
echo "Part ${PART_NUM} uploaded with ETag: ${ETAG}"
else
echo "Error: Failed to get ETag for part ${PART_NUM}"
exit 1
fi
PART_NUM=$((PART_NUM + 1))
done
# Check if all parts were uploaded and ETags collected
if [ ${#ETAGS[@]} -ne ${NUM_PARTS} ]; then
echo "Error: Not all parts were uploaded or ETags collected."
exit 1
fi
# Construct parts JSON for finalization
PARTS_JSON="["
for i in "${!ETAGS[@]}"; do
if [ "$i" -gt 0 ]; then
PARTS_JSON="${PARTS_JSON},"
fi
PARTS_JSON="${PARTS_JSON}{\"ETag\":\"${ETAGS[$i]}\",\"PartNumber\":${PART_NUMBERS[$i]}}"
done
PARTS_JSON="${PARTS_JSON}]"
# Finalize upload
FINALIZE_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/finalize" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": ${PARTS_JSON}}")
# Get download URL
DOWNLOAD_RESPONSE=$(curl -s -w "\n%{http_code}" -X GET "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/download/getPreSignedUrls?processed=false" \
-H "Authorization: Bearer ${COSMOS_KEY}")
DOWNLOAD_JSON=$(echo "$DOWNLOAD_RESPONSE" | sed '$d')
PRESIGNED_URL=$(echo "$DOWNLOAD_JSON" | jq -r '.url')
# Process dataset
PROCESS_RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/process" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"url\": \"${PRESIGNED_URL}\"}")
Linking S3 Input/Output Buckets#
Important
When using an AWS Access Key, ensure you provide only the minimum S3 permissions required for NeMo Curator on DGX Cloud operations. The curator service should have read-only permissions for the input data bucket and read/write permissions for the output data bucket. The AWS Access Key should not provide read/write permissions to any buckets except those associated with dataset input/output operations.
#!/bin/bash
# Set your variables
export COSMOS_KEY='your-api-key'
# Check if AWS credentials file exists
if [ ! -f ~/.aws/credentials ]; then
echo "Error: AWS credentials file not found at ~/.aws/credentials"
echo "Please make sure your AWS credentials are properly configured"
exit 1
fi
# Base64 encode the AWS credentials file for s3Config
S3_CONFIG=$(base64 -w 0 ~/.aws/credentials)
# Create dataset with S3 input and output
S3_DATASET_RESPONSE=$(curl -s -w "\n%{http_code}" "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{
\"name\": \"s3-test-dataset\",
\"description\": \"S3 input/output test\",
\"s3InputPrefix\": \"s3://your-input-bucket/path/\",
\"s3OutputPrefix\": \"s3://your-output-bucket/path/\",
\"s3Config\": \"${S3_CONFIG}\",
\"jobSpec\": {
\"pipeline\": \"split\",
\"args\": {
\"generate_embeddings\": true,
\"generate_previews\": true,
\"generate_captions\": true,
\"splitting_algorithm\": \"transnetv2\",
\"captioning_prompt_variant\": \"default\",
\"captioning_prompt_text\": \"your caption prompt\"
}
}
}")
HTTP_STATUS=$(echo "$S3_DATASET_RESPONSE" | tail -n 1)
S3_DATASET_JSON=$(echo "$S3_DATASET_RESPONSE" | sed '$d')
S3_DATASET_ID=$(echo "$S3_DATASET_JSON" | jq -r '.id')
echo "S3 dataset created with ID: ${S3_DATASET_ID}"
echo "S3 processing will happen asynchronously."
API Endpoints#
Create a Dataset#
POST /v1/cosmos/datasets
Creates a dataset and returns a dataset ID.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Content-Type
:application/json
Request Body
The request body is a JSON object containing curation parameters.
Response Body
The response body is a JSON object containing the id
parameter specifying the ID of the new dataset. If the returned id
is empty or
a null value, then dataset creation failed.
Example
curl -s "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d '{"name": "My Dataset", "description": "Test dataset", "jobSpec": {
"pipeline": "split",
"args": {
"generate_embeddings": true,
"generate_previews": true,
"generate_captions": true,
"splitting_algorithm": "transnetv2",
"captioning_prompt_variant": "default",
"captioning_prompt_text": "actual prompt"
}
}}'
Get a Dataset#
GET /v1/cosmos/datasets/{dataset_id}
Retrieves the details of the dataset with the specified ID.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Response Body
The response body is a JSON object containing dataset details.
Example
{
"name": "test dataset",
"userDescription": "my dataset",
"humanAttributesEnabled": false,
"org": "0461422519424422",
"ncaId": "Rz8I0e_JP2ptQU0rFDVP9ZfqPjhhRHhEELcNSj2i1yE",
"team": "no_team",
"owner": "33nxR5rY_tl2FWDNSh7Tok5MeB9NzutZhFS9Tfyi-er",
"type": "ZipFile",
"url": "",
"inputS3Prefix": "",
"outputS3Prefix": "",
"s3Config": "",
"videoType": "",
"videoPrompt": "",
"jobSpec": {
"pipeline": "split",
"args": {
"generate_embeddings": true,
"generate_previews": true,
"generate_captions": true,
"splitting_algorithm": "transnetv2",
"captioning_prompt_variant": "default",
"captioning_prompt_text": "actual prompt"
}
},
"dateCreated": "2025-03-13T23:52:43.195Z",
"lastModifiedTimestamp": "2025-03-13T23:52:43.195Z",
"lastStatus": {
"status": "CREATING",
"message": "Waiting for user to upload files and proceed to processing",
"details": "0%"
},
"user": {
"email": "example_user@nvidia.com",
"name": "example_user"
},
"id": "168166c7-6e32-4d3a-b2ea-c748b1f3cbf2"
}
Get All Datasets by Organization#
GET /v1/cosmos/datasets?filter=by:org
Retrieves the details of the dataset with the specified ID.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Request Body
None
Response Body
The response body is a JSON object containing details for each dataset.
Initialize Dataset Upload#
POST /v1/cosmos/datasets/{dataset_id}/upload/initialize
Initializes uploading a file to the dataset with the specified ID.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Content-Type
:application/json
Request Body
None
Request Body
The response body will be a JSON object containing the “fileId” and “fileKey” to use when uploading the file.
Example
curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/initialize" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d '{}'
Get Presigned URLs#
POST /v1/cosmos/datasets/{dataset_id}/upload/getPreSignedUrls
Retrieves an array of presigned URLs for uploading the dataset ZIP file as a multipart upload.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Content-Type
:application/json
Request Body
The request body is a JSON object with the following parameters:
fileId
(string): The ID of the file returned during the initialization stepfileKey
(string): The key of the file returned during the initialization stepparts
(int): The number of parts to uploadexpires
(int): The expiration time (in seconds) for the presigned URLs
Response Body
The response body is a JSON object containing an array of presigned URLs for uploading the dataset ZIP file.
Example
curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/getPreSignedUrls" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": 5, \"expires\": 3600}"
Finalize Dataset Upload#
POST /v1/cosmos/datasets/{dataset_id}/upload/finalize
Completes a multipart upload by providing ETags (Entity Tags) for all uploaded parts.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Content-Type
:application/json
Request Body
The request body is a JSON object with the following parameters:
fileId
(string): The ID of the file returned during the initialization stepfileKey
(string): The key of the file returned during the initialization stepparts
(list): A list of parts that have been uploaded. Each part should be an object with the following parameters: *partNumber
(int): The part number *ETag
(string): The ETag for the part
Response Body
The response body is a JSON confirmation object.
Example
curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/upload/finalize" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"fileId\": \"${FILE_ID}\", \"fileKey\": \"${FILE_KEY}\", \"parts\": [{\"ETag\":\"\\\"abc123\\\"\",\"PartNumber\":1}]}"
Get Dataset Download URL#
GET /v1/cosmos/datasets/{dataset_id}/download/getPreSignedUrls
Retrieves a presigned URL for downloading the dataset ZIP file.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Request Body
None
Response Body
The response body is a JSON object containing the presigned URL for downloading the dataset ZIP file.
Example
curl -s -X GET "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/download/getPreSignedUrls?processed=false" \
-H "Authorization: Bearer ${COSMOS_KEY}"
Process Dataset#
POST /v1/cosmos/datasets/{dataset_id}/process
Processes the dataset with the specified ID. Once this call is received, the server will begin generating captions for the video files and process the results for storage/retrieval.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Content-Type
:application/json
Request Body
The request body is a JSON object with the “url” parameter, which is a string specifying the presigned download URL of the dataset (refer to the Get Dataset Download URL specification for more details).
Response Body
The response body is a JSON confirmation object.
Example
curl -s -X POST "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}/process" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{\"url\": \"${PRESIGNED_URL}\"}"
Delete Dataset#
DELETE /v1/cosmos/datasets/{dataset_id}
Deletes the dataset with the specified ID.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Request Body
None
Response Body
The response body is a JSON confirmation object.
curl -s -X DELETE "https://api.ngc.nvidia.com/v1/cosmos/datasets/${DATASET_ID}" \
-H "Authorization: Bearer ${COSMOS_KEY}"
Process S3 Input/Output#
POST /v1/cosmos/datasets/datasets
Creates a dataset that processes data directly from–and writes data directly to–S3 buckets.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Content-Type
:application/json
Request Body
The request body is a JSON object with the following parameters:
{
"name": "Your Dataset Name",
"description": "Your dataset description",
"s3InputPrefix": "s3://your-input-bucket/path/",
"s3OutputPrefix": "s3://your-output-bucket/path/",
"s3Config": "BASE64_ENCODED_AWS_CREDENTIALS",
"jobSpec": {
"pipeline": "split",
"args": {
"generate_embeddings": true,
"generate_previews": true,
"generate_captions": true,
"splitting_algorithm": "transnetv2",
"captioning_prompt_variant": "default",
"captioning_prompt_text": "your prompt text"
}
}
}
Refer to the Curation Parameters page for a description of all available “jobSpec” parameters.
Response Body
The response body is a JSON object containing the dataset details, including the id
parameter specifying the ID of the new dataset.
Example
# Base64 encode AWS credentials
S3_CONFIG=$(base64 -w 0 ~/.aws/credentials)
curl -s "https://api.ngc.nvidia.com/v1/cosmos/datasets" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d "{
\"name\": \"s3-test-dataset\",
\"description\": \"S3 input/output test\",
\"s3InputPrefix\": \"s3://pre-signed-test-curator/dtzeng/testinput1/\",
\"s3OutputPrefix\": \"s3://pre-signed-test-curator/dtzeng/dtbucket15/\",
\"s3Config\": \"${S3_CONFIG}\",
\"jobSpec\": {
\"pipeline\": \"split\",
\"args\": {
\"generate_embeddings\": true,
\"generate_previews\": true,
\"generate_captions\": true,
\"splitting_algorithm\": \"transnetv2\",
\"captioning_prompt_variant\": \"default\",
\"captioning_prompt_text\": \"actual prompt\"
}
}
}"
Get Dataset Captions#
GET /v1/cosmos/datasets/{dataset_id}/captions
Retrieves the text captions for each video in the dataset with the specified ID.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Request Body
None
Response Body
The response body is a JSON object containing captions for videos as a list of key/value pairs. Each key
corresponds to the file_id
of a video file, with the value corresponding to the captions for the video.
Update Dataset Captions#
PATCH /v1/cosmos/datasets/{dataset_id}/captions/{caption_id}
Updates the text of the specified caption.
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Content-Type
:application/json
Request Body
The request body is a JSON object containing a single “caption” parameter, which is a string containing the updated caption.
Response Body
The response body is a JSON confirmation object.
Example
curl -X PATCH "https://api.ngc.nvidia.com/v1/cosmos/datasets/50b48bee-ecb1-4a22-afe9-ae90bd4864ae/captions/0" \
-H "Authorization: Bearer ${COSMOS_KEY}" \
-H "Content-Type: application/json" \
-d '{
"caption": "The video begins with a black car driving down a road, surrounded by trees and buildings. The car is moving at a moderate speed, and the camera captures its movements from a distance. As the car continues to drive, it passes by several other vehicles, including a white truck and a blue car. The road is well-paved, and there are no visible pedestrians or cyclists in the scene.\nAs the car approaches an intersection, the camera zooms in on the vehicle, providing a closer view of its features. The car has a sleek design, with a black exterior and green accents on the doors and hood. The front grille is prominent, and the headlights are sharp and angular. The car also has a roof-mounted sensor array, indicating that it may be equipped for autonomous driving or other advanced technologies.\nOverall, the video provides a detailed look at the car\'s design and features as it moves through an urban environment. The camera work is smooth and steady, capturing the car\'s movements and surroundings effectively."
}'
Terminate All Jobs#
Terminates all in-progress jobs for Organization.
DELETE /v1/cosmos/jobs
Request Headers
Authorization
:Bearer ${COSMOS_KEY}
Request Body
None
Response Body
The response body is a JSON confirmation object.
Example
curl -X DELETE "https://api.ngc.nvidia.com/v1/cosmos/jobs" \
-H "Authorization: Bearer ${COSMOS_KEY}"