API Reference#

For the full OpenAPI 3.1 schema, you can access the interactive documentation or the raw JSON file:

Interactive Docs: http://<host>:8000/docs
OpenAPI JSON: http://<host>:8000/openapi.json

Endpoints#

POST /v1/embeddings#

Generates embedding vectors for text or video inputs. This is the primary inference endpoint.

Request Body

Parameter	Type	Required	Default	Description
`input`	string	array[string]	Yes
`request_type`	string	Yes		Specifies the processing mode, which can be `query`, `bulk_text`, or `bulk_video`.
`model`	string	Yes		The ID of the embedding model to use, which currently must be `nvidia/cosmos-embed1`.
`encoding_format`	string	No	`float`	The format for the returned embeddings, which can be `float` or `base64`.

Input String Formats

The input field accepts the following formats:

Plain Text: “A sentence to be embedded.”
Base64-Encoded Video: “data:video/mp4;base64,<base64-encoded-video-data>”
Presigned URL for Video: “data:video/mp4;presigned_url,https://your-url/video.mp4” (for bulk_video and query modes). For bulk_video, only presigned URLs are allowed.

Notes and constraints

bulk_video requests must contain only presigned video URLs.
Maximum inputs per request: 64 items for bulk_text and bulk_video modes
Recommended video duration: 15 seconds
Maximum recommended video duration: 1–2 minutes; no strict maximum is enforced by the NIM.
Supported codecs: H.264, HEVC, AV1, VP8, VP9, VC1, MPEG4, MPEG2, MPEG1; codec use is not enforced by the NIM.
Base64 format is accepted only in query mode.

Text Query Payload

{
"input": "A fluffy white cat basking in the sun.",
"model": "nvidia/cosmos-embed1",
"request_type": "query"
}

Bulk Text Payload

{
"input": [
  "This is the first sentence.",
  "Here is a second one for batch processing."
],
"model": "nvidia/cosmos-embed1",
"request_type": "bulk_text"
}

Bulk Video Payload

{
"input": [
  "data:video/webm;presigned_url,https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm",
  "data:video/webm;presigned_url,https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm"
],
"model": "nvidia/cosmos-embed1",
"request_type": "bulk_video"
}

Response Body

The response is a JSON object containing the generated embeddings and usage statistics.

Parameter	Type	Description
object	string	The type of object, always list
data	array[object]	An array of embedding objects
model	string	The model used for the request (e.g. `nvidia/cosmos-embed1`)
usage	object	An object containing token and video counts

Embedding Object

Parameter	Type	Description
object	string	The type of object, always embedding.
index	integer	The index of this embedding in the data array
embedding	array[float]	The embedding vector

Usage Object

Parameter	Type	Description
num_videos	integer	Number of videos processed in the request.
prompt_tokens	integer	Number of text tokens in the prompt.
total_tokens	integer	Total tokens processed.

Example Response

{
"object": "list",
"data": [
    {
    "object": "embedding",
    "index": 0,
"embedding": [
    0.0123456789,
    -0.0987654321
    ]
}
],
"model": "nvidia/cosmos-embed1",
"usage": {
    "num_videos": 0,
    "prompt_tokens": 10,
    "total_tokens": 10
}
}

GET /v1/health/ready#

Checks if the service is ready to accept inference requests.

Example Response

{
"object": "health.response",
"message": "NIM Service is ready"
}

GET /v1/health/live#

Checks if the service is running (live). It may not yet be ready for inference.

Example Response

{
"object": "health.response",
"message": "NIM Service is live"
}

GET /v1/metadata#

Provides metadata about the NIM container, including version and model information.

GET /v1/manifest#

Returns the NIM manifest file content.

GET /v1/license#

Returns the license information for the NIM.

GET /v1/metrics#

Returns Prometheus-compatible metrics for monitoring.