API Reference#

For the full OpenAPI 3.1 schema, you can access the interactive documentation or the raw JSON file:

  • Interactive Docs: http://<host>:8000/docs

  • OpenAPI JSON: http://<host>:8000/openapi.json

Endpoints#

POST /v1/embeddings#

Generates embedding vectors for text or video inputs. This is the primary inference endpoint.

Request Body

Parameter

Type

Required

Default

Description

input

string

array[string]

Yes

request_type

string

Yes

Specifies the processing mode, which can be query, bulk_text, or bulk_video.

model

string

Yes

The ID of the embedding model to use, which currently must be nvidia/cosmos-embed1.

encoding_format

string

No

float

The format for the returned embeddings, which can be

float or base64.

Input String Formats

The input field accepts the following formats:

Notes and constraints

  • bulk_video requests must contain only presigned video URLs.

  • Maximum inputs per request: 64 items for bulk_text and bulk_video modes

  • Recommended video duration: 15 seconds

  • Maximum recommended video duration: 1–2 minutes; no strict maximum is enforced by the NIM.

  • Supported codecs: H.264, HEVC, AV1, VP8, VP9, VC1, MPEG4, MPEG2, MPEG1; codec use is not enforced by the NIM.

  • Base64 format is accepted only in query mode.

Text Query Payload

{
"input": "A fluffy white cat basking in the sun.",
"model": "nvidia/cosmos-embed1",
"request_type": "query"
}

Bulk Text Payload

{
"input": [
  "This is the first sentence.",
  "Here is a second one for batch processing."
],
"model": "nvidia/cosmos-embed1",
"request_type": "bulk_text"
}

Bulk Video Payload

{
"input": [
  "data:video/webm;presigned_url,https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm",
  "data:video/webm;presigned_url,https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm"
],
"model": "nvidia/cosmos-embed1",
"request_type": "bulk_video"
}

Response Body

The response is a JSON object containing the generated embeddings and usage statistics.

Parameter

Type

Description

object

string

The type of object, always list

data

array[object]

An array of embedding objects

model

string

The model used for the request (e.g. nvidia/cosmos-embed1)

usage

object

An object containing token and video counts

Embedding Object

Parameter

Type

Description

object

string

The type of object, always embedding.

index

integer

The index of this embedding in the data array

embedding

array[float]

The embedding vector

Usage Object

Parameter

Type

Description

num_videos

integer

Number of videos processed in the request.

prompt_tokens

integer

Number of text tokens in the prompt.

total_tokens

integer

Total tokens processed.

Example Response

{
"object": "list",
"data": [
    {
    "object": "embedding",
    "index": 0,
"embedding": [
    0.0123456789,
    -0.0987654321
    ]
}
],
"model": "nvidia/cosmos-embed1",
"usage": {
    "num_videos": 0,
    "prompt_tokens": 10,
    "total_tokens": 10
}
}

GET /v1/health/ready#

Checks if the service is ready to accept inference requests.

Example Response

{
"object": "health.response",
"message": "NIM Service is ready"
}

GET /v1/health/live#

Checks if the service is running (live). It may not yet be ready for inference.

Example Response

{
"object": "health.response",
"message": "NIM Service is live"
}

GET /v1/metadata#

Provides metadata about the NIM container, including version and model information.

GET /v1/manifest#

Returns the NIM manifest file content.

GET /v1/license#

Returns the license information for the NIM.

GET /v1/metrics#

Returns Prometheus-compatible metrics for monitoring.