Quickstart Guide#

This guide walks you through launching the Cosmos Embed1 NIM and generating your first text and video embeddings.

Downloading the NIM#

Pull the Cosmos Embed1 NIM container from NGC:

docker pull nvcr.io/nim/nvidia/cosmos-embed1:1.0.0

Launch the NIM#

The following Docker command will launch the NIM with default parameters for the host hardware.

docker run -it --rm --runtime=nvidia --name=cosmos-embed1 --gpus device=0 -p 8000:8000 -e NGC_API_KEY=$NGC_API_KEY nvcr.io/nim/nvidia/cosmos-embed1:1.0.0

Upon successful startup, the NIM will print the following to stdout:

===============================================================================

NIM Cosmos Embed1 Service Ready!

FastAPI server is now accepting requests

API documentation available at: http://localhost:8000/docs

Health check endpoint: http://localhost:8000/v1/health/ready

===============================================================================

Check the NIM Health#

Invoking with cURL#

To check the health of the NIM using cURL, execute the following command:

curl -X GET http://localhost:8000/v1/health/ready -H 'accept: application/json'

Upon a successful health check, you should receive a JSON response similar to this:

{
"object": "health.response",
"message": "NIM Service is ready"
}

Invoking with Python#

You can also check the NIM health programmatically using Python:

import requests
r = requests.get("http://localhost:8000/v1/health/ready")
if r.status_code == 200:
   print("NIM is ready!")
else:
   print("NIM not ready:", r.text)

Submit an Embedding Request#

Invoking with cURL#

To submit an embedding request using cURL, send a POST request to the /v1/embeddings endpoint with your input data:

curl -X POST http://localhost:8000/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{
   "input": [
      "The quick brown fox jumps over the lazy dog"
   ],
  "request_type": "query",
  "encoding_format": "float",
 "model": "nvidia/cosmos-embed1"
 }'

Invoking a video query with Python and base64#

import base64
import requests

# Example sample video:
# https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm
# If needed, from a shell you can download it with one line:
# wget -O javelin_throw.webm "https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm"
video_file = "./javelin_throw.webm"
# Read the video file in binary mode
with open(video_file, 'rb') as f:
   video_bytes = f.read()
video_b64 = base64.b64encode(video_bytes).decode('utf-8')

payload = {
   "input": [
      f"data:video/webm;base64,{video_b64}"
   ],
   "request_type": "query",
   "encoding_format": "float",
   "model": "nvidia/cosmos-embed1"
}

r = requests.post("http://localhost:8000/v1/embeddings", json=payload)
print(r.json())

Invoking bulk_video request with Python#

Here’s an example of submitting an embedding request using Python, specifically for a bulk_video request type. This request type exclusively accepts video URLs. The video URLs must be accessible from the NIM host system. Up to 64 video URLs can be submitted in a single request while maintaining optimal performance:

import requests

# Correct example for bulk_video (only video URLs)
payload = {
"input": [
   "data:video/webm;presigned_url,<video_url_1>",
   "data:video/webm;presigned_url,<video_url_2>"
],
"request_type": "bulk_video",
"encoding_format": "float",
"model": "nvidia/cosmos-embed1"
}

r = requests.post("http://localhost:8000/v1/embeddings", json=payload)
print(r.json())

Stopping the Container#

Use the following commands to stop and remove the running container:

docker stop cosmos-embed1
docker rm cosmos-embed1

Caching the Model#

  1. Create a model cache directory on the host machine:

export LOCAL_NIM_CACHE="$HOME/.cache/cosmos-embed1"
mkdir -p "$LOCAL_NIM_CACHE"
chmod 777 "$LOCAL_NIM_CACHE"
  1. Run the container with the cache directory as a volume mount:

docker run -it --rm --runtime=nvidia --name=cosmos-embed1 --gpus device=0 \
  -p 8000:8000 \
  -e NGC_API_KEY=$NGC_API_KEY \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  --shm-size=16g \
  --tmpfs /tmp/ram:rw,size=2g \
  nvcr.io/nim/nvidia/cosmos-embed1:1.0.0

Note

This example includes --shm-size and --tmpfs flags to improve performance. Refer to the Configuring the NIM section for details on these parameters.