Quickstart Guide#
This guide walks you through launching the Cosmos Embed1 NIM and generating your first text and video embeddings.
Downloading the NIM#
Pull the Cosmos Embed1 NIM container from NGC:
docker pull nvcr.io/nim/nvidia/cosmos-embed1:1.0.0
Launch the NIM#
The following Docker command will launch the NIM with default parameters for the host hardware.
docker run -it --rm --runtime=nvidia --name=cosmos-embed1 --gpus device=0 -p 8000:8000 -e NGC_API_KEY=$NGC_API_KEY nvcr.io/nim/nvidia/cosmos-embed1:1.0.0
Upon successful startup, the NIM will print the following to stdout
:
===============================================================================
NIM Cosmos Embed1 Service Ready!
FastAPI server is now accepting requests
API documentation available at: http://localhost:8000/docs
Health check endpoint: http://localhost:8000/v1/health/ready
===============================================================================
Check the NIM Health#
Invoking with cURL#
To check the health of the NIM using cURL, execute the following command:
curl -X GET http://localhost:8000/v1/health/ready -H 'accept: application/json'
Upon a successful health check, you should receive a JSON response similar to this:
{
"object": "health.response",
"message": "NIM Service is ready"
}
Invoking with Python#
You can also check the NIM health programmatically using Python:
import requests
r = requests.get("http://localhost:8000/v1/health/ready")
if r.status_code == 200:
print("NIM is ready!")
else:
print("NIM not ready:", r.text)
Submit an Embedding Request#
Invoking with cURL#
To submit an embedding request using cURL, send a POST request to the
/v1/embeddings
endpoint with your input data:
curl -X POST http://localhost:8000/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{
"input": [
"The quick brown fox jumps over the lazy dog"
],
"request_type": "query",
"encoding_format": "float",
"model": "nvidia/cosmos-embed1"
}'
Invoking a video query with Python and base64#
import base64
import requests
# Example sample video:
# https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm
# If needed, from a shell you can download it with one line:
# wget -O javelin_throw.webm "https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm"
video_file = "./javelin_throw.webm"
# Read the video file in binary mode
with open(video_file, 'rb') as f:
video_bytes = f.read()
video_b64 = base64.b64encode(video_bytes).decode('utf-8')
payload = {
"input": [
f"data:video/webm;base64,{video_b64}"
],
"request_type": "query",
"encoding_format": "float",
"model": "nvidia/cosmos-embed1"
}
r = requests.post("http://localhost:8000/v1/embeddings", json=payload)
print(r.json())
Invoking bulk_video request with Python#
Here’s an example of submitting an embedding request using Python,
specifically for a bulk_video
request type. This request type
exclusively accepts video URLs. The video URLs must be accessible from
the NIM host system. Up to 64 video URLs can be submitted in a single
request while maintaining optimal performance:
import requests
# Correct example for bulk_video (only video URLs)
payload = {
"input": [
"data:video/webm;presigned_url,<video_url_1>",
"data:video/webm;presigned_url,<video_url_2>"
],
"request_type": "bulk_video",
"encoding_format": "float",
"model": "nvidia/cosmos-embed1"
}
r = requests.post("http://localhost:8000/v1/embeddings", json=payload)
print(r.json())
Stopping the Container#
Use the following commands to stop and remove the running container:
docker stop cosmos-embed1
docker rm cosmos-embed1
Caching the Model#
Create a model cache directory on the host machine:
export LOCAL_NIM_CACHE="$HOME/.cache/cosmos-embed1"
mkdir -p "$LOCAL_NIM_CACHE"
chmod 777 "$LOCAL_NIM_CACHE"
Run the container with the cache directory as a volume mount:
docker run -it --rm --runtime=nvidia --name=cosmos-embed1 --gpus device=0 \
-p 8000:8000 \
-e NGC_API_KEY=$NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
--shm-size=16g \
--tmpfs /tmp/ram:rw,size=2g \
nvcr.io/nim/nvidia/cosmos-embed1:1.0.0
Note
This example includes --shm-size
and --tmpfs
flags to improve performance. Refer to the Configuring the NIM section
for details on these parameters.