API Reference#
For the full OpenAPI 3.1 schema, you can access the interactive documentation or the raw JSON file:
Interactive Docs:
http://<host>:8000/docsOpenAPI JSON:
http://<host>:8000/openapi.json
Endpoints#
POST /v1/embeddings#
Generates embedding vectors for text or video inputs. This is the primary inference endpoint.
Request Body
Parameter  | 
Type  | 
Required  | 
Default  | 
Description  | 
|---|---|---|---|---|
  | 
string  | 
array[string]  | 
Yes  | 
|
  | 
string  | 
Yes  | 
Specifies the
processing mode, which can
be   | 
|
  | 
string  | 
Yes  | 
The ID of
the
embedding
model to
use, which currently must
be   | 
|
  | 
string  | 
No  | 
  | 
The format for the returned embeddings, which can be 
  | 
Input String Formats
The input field accepts the following formats:
Plain Text: “A sentence to be embedded.”
Base64-Encoded Video: “data:video/mp4;base64,<base64-encoded-video-data>”
Presigned URL for Video: “data:video/mp4;presigned_url,https://your-url/video.mp4” (for
bulk_videoandquerymodes). Forbulk_video, only presigned URLs are allowed.
Notes and constraints
bulk_videorequests must contain only presigned video URLs.Maximum inputs per request: 64 items for
bulk_textandbulk_videomodesRecommended video duration: 15 seconds
Maximum recommended video duration: 1–2 minutes; no strict maximum is enforced by the NIM.
Supported codecs: H.264, HEVC, AV1, VP8, VP9, VC1, MPEG4, MPEG2, MPEG1; codec use is not enforced by the NIM.
Base64 format is accepted only in
querymode.
Text Query Payload
{
"input": "A fluffy white cat basking in the sun.",
"model": "nvidia/cosmos-embed1",
"request_type": "query"
}
Bulk Text Payload
{
"input": [
  "This is the first sentence.",
  "Here is a second one for batch processing."
],
"model": "nvidia/cosmos-embed1",
"request_type": "bulk_text"
}
Bulk Video Payload
{
"input": [
  "data:video/webm;presigned_url,https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm",
  "data:video/webm;presigned_url,https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm"
],
"model": "nvidia/cosmos-embed1",
"request_type": "bulk_video"
}
Response Body
The response is a JSON object containing the generated embeddings and usage statistics.
Parameter  | 
Type  | 
Description  | 
|---|---|---|
object  | 
string  | 
The type of object, always list  | 
data  | 
array[object]  | 
An array of embedding objects  | 
model  | 
string  | 
The model used for the request (e.g.
  | 
usage  | 
object  | 
An object containing token and video counts  | 
Embedding Object
Parameter  | 
Type  | 
Description  | 
|---|---|---|
object  | 
string  | 
The type of object, always embedding.  | 
index  | 
integer  | 
The index of this embedding in the data array  | 
embedding  | 
array[float]  | 
The embedding vector  | 
Usage Object
Parameter  | 
Type  | 
Description  | 
|---|---|---|
num_videos  | 
integer  | 
Number of videos processed in the request.  | 
prompt_tokens  | 
integer  | 
Number of text tokens in the prompt.  | 
total_tokens  | 
integer  | 
Total tokens processed.  | 
Example Response
{
"object": "list",
"data": [
    {
    "object": "embedding",
    "index": 0,
"embedding": [
    0.0123456789,
    -0.0987654321
    ]
}
],
"model": "nvidia/cosmos-embed1",
"usage": {
    "num_videos": 0,
    "prompt_tokens": 10,
    "total_tokens": 10
}
}
GET /v1/health/ready#
Checks if the service is ready to accept inference requests.
Example Response
{
"object": "health.response",
"message": "NIM Service is ready"
}
GET /v1/health/live#
Checks if the service is running (live). It may not yet be ready for inference.
Example Response
{
"object": "health.response",
"message": "NIM Service is live"
}
GET /v1/metadata#
Provides metadata about the NIM container, including version and model information.
GET /v1/manifest#
Returns the NIM manifest file content.
GET /v1/license#
Returns the license information for the NIM.
GET /v1/metrics#
Returns Prometheus-compatible metrics for monitoring.