API Reference
OpenAPI Schema
The OpenAPI specification details the endpoints for NVIDIA NIM for VLMs:
/v1/models - List available models
/v1/health/ready - Health check
/v1/health/live - Service liveness check
/v1/chat/completions - OpenAI-compatible chat endpoint
/inference/chat_completion - Llama Stack compatible chat endpoint
API Examples
Use the examples in this section to help you get started with using the API.
List Models
cURL Request
Use the following command to list the available models.
curl -X 'GET' 'http://0.0.0.0:8000/v1/models'
Response
{
"object": "list",
"data": [
{
"id": "meta/llama-3.2-11b-vision-instruct",
"object": "model",
"created": 1724796510,
"owned_by": "system",
"root": "meta/llama-3.2-11b-vision-instruct",
"parent": null,
"max_model_len": 131072,
"permission": [
{
"id": "modelperm-c2e069f426cc43088eb408f388578289",
"object": "model_permission",
"created": 1724796510,
"allow_create_engine": false,
"allow_sampling": true,
"allow_logprobs": true,
"allow_search_indices": false,
"allow_view": true,
"allow_fine_tuning": false,
"organization": "*",
"group": null,
"is_blocking": false
}
]
}
]
}
Check Health
Use the following command to check server health.
cURL Request
curl -X 'GET' 'http://0.0.0.0:8000/v1/health/ready'
Response
{
"object": "health.response",
"message": "Service is ready."
}
Check Service Liveness
Use the following command to check service liveness.
cURL Request
curl -X 'GET' 'http://0.0.0.0:8000/v1/health/live'
Response
{
"object": "readyhealth.response",
"message": "Service is live."
}
OpenAI Chat Completions
Use the following command to query the OpenAI chat completions endpoint.
cURL Request
curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta/llama-3.2-11b-vision-instruct",
"messages": [
{
"role":"user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url":
{
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
"max_tokens": 256
}'
Response
{
"id": "chat-8c5f5115fa464ab593963d5764498350",
"object": "chat.completion",
"created": 1729020253,
"model": "meta/llama-3.2-11b-vision-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant"
"content": "This image shows a boardwalk in a field of tall grass. ..."
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 17,
"total_tokens": 138,
"completion_tokens": 121
},
"prompt_logprobs": null
}
Llama Stack Chat Completion
Use the following command to query the Llama Stack chat completion endpoint.
cURL Request
curl -X 'POST' \
'http://0.0.0.0:8000/ls/inference/chat_completion' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta/llama-3.2-11b-vision-instruct",
"messages": [
{
"role":"user",
"content": [
{
"image":
{
"uri": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
},
"What is in this image?"
]
}
]
}'
Response
{
"completion_message": {
"role": "assistant",
"content": "This image shows a boardwalk in a field of tall grass. ...",
"stop_reason": "end_of_turn"
},
"logprobs": null
}