API Reference#
For the full OpenAPI 3.1 schema, you can access the interactive documentation or the raw JSON file:
Interactive Docs:
http://<host>:8000/docs
OpenAPI JSON:
http://<host>:8000/openapi.json
Endpoints#
POST /v1/embeddings#
Generates embedding vectors for text or video inputs. This is the primary inference endpoint.
Request Body
Parameter |
Type |
Required |
Default |
Description |
---|---|---|---|---|
|
string |
array[string] |
Yes |
|
|
string |
Yes |
Specifies the
processing mode, which can
be |
|
|
string |
Yes |
The ID of
the
embedding
model to
use, which currently must
be |
|
|
string |
No |
|
The format for the returned embeddings, which can be
|
Input String Formats
The input field accepts the following formats:
Plain Text: “A sentence to be embedded.”
Base64-Encoded Video: “data:video/mp4;base64,<base64-encoded-video-data>”
Presigned URL for Video: “data:video/mp4;presigned_url,https://your-url/video.mp4” (for
bulk_video
andquery
modes). Forbulk_video
, only presigned URLs are allowed.
Notes and constraints
bulk_video
requests must contain only presigned video URLs.Maximum inputs per request: 64 items for
bulk_text
andbulk_video
modesRecommended video duration: 15 seconds
Maximum recommended video duration: 1–2 minutes; no strict maximum is enforced by the NIM.
Supported codecs: H.264, HEVC, AV1, VP8, VP9, VC1, MPEG4, MPEG2, MPEG1; codec use is not enforced by the NIM.
Base64 format is accepted only in
query
mode.
Text Query Payload
{
"input": "A fluffy white cat basking in the sun.",
"model": "nvidia/cosmos-embed1",
"request_type": "query"
}
Bulk Text Payload
{
"input": [
"This is the first sentence.",
"Here is a second one for batch processing."
],
"model": "nvidia/cosmos-embed1",
"request_type": "bulk_text"
}
Bulk Video Payload
{
"input": [
"data:video/webm;presigned_url,https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm",
"data:video/webm;presigned_url,https://upload.wikimedia.org/wikipedia/commons/3/3d/Branko_Paukovic%2C_javelin_throw.webm"
],
"model": "nvidia/cosmos-embed1",
"request_type": "bulk_video"
}
Response Body
The response is a JSON object containing the generated embeddings and usage statistics.
Parameter |
Type |
Description |
---|---|---|
object |
string |
The type of object, always list |
data |
array[object] |
An array of embedding objects |
model |
string |
The model used for the request (e.g.
|
usage |
object |
An object containing token and video counts |
Embedding Object
Parameter |
Type |
Description |
---|---|---|
object |
string |
The type of object, always embedding. |
index |
integer |
The index of this embedding in the data array |
embedding |
array[float] |
The embedding vector |
Usage Object
Parameter |
Type |
Description |
---|---|---|
num_videos |
integer |
Number of videos processed in the request. |
prompt_tokens |
integer |
Number of text tokens in the prompt. |
total_tokens |
integer |
Total tokens processed. |
Example Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
0.0123456789,
-0.0987654321
]
}
],
"model": "nvidia/cosmos-embed1",
"usage": {
"num_videos": 0,
"prompt_tokens": 10,
"total_tokens": 10
}
}
GET /v1/health/ready#
Checks if the service is ready to accept inference requests.
Example Response
{
"object": "health.response",
"message": "NIM Service is ready"
}
GET /v1/health/live#
Checks if the service is running (live). It may not yet be ready for inference.
Example Response
{
"object": "health.response",
"message": "NIM Service is live"
}
GET /v1/metadata#
Provides metadata about the NIM container, including version and model information.
GET /v1/manifest#
Returns the NIM manifest file content.
GET /v1/license#
Returns the license information for the NIM.
GET /v1/metrics#
Returns Prometheus-compatible metrics for monitoring.