API Reference#
OpenAPI Schema#
The OpenAPI specification details the endpoints for NVIDIA NIM for VLMs:
/v1/models - List available models
/v1/health/ready - Health check
/v1/health/live - Service liveness check
/v1/chat/completions - OpenAI-compatible chat endpoint
/inference/chat_completion - Llama Stack compatible chat endpoint
API Examples#
Use the examples in this section to help you get started with using the API.
List Models#
cURL Request
Use the following command to list the available models.
curl -X 'GET' 'http://0.0.0.0:8000/v1/models'
Response
{
"object": "list",
"data": [
{
"id": "meta/llama-3.2-11b-vision-instruct",
"object": "model",
"created": 1724796510,
"owned_by": "system",
"root": "meta/llama-3.2-11b-vision-instruct",
"parent": null,
"max_model_len": 131072,
"permission": [
{
"id": "modelperm-c2e069f426cc43088eb408f388578289",
"object": "model_permission",
"created": 1724796510,
"allow_create_engine": false,
"allow_sampling": true,
"allow_logprobs": true,
"allow_search_indices": false,
"allow_view": true,
"allow_fine_tuning": false,
"organization": "*",
"group": null,
"is_blocking": false
}
]
}
]
}
Check Health#
Use the following command to check server health.
cURL Request
curl -X 'GET' 'http://0.0.0.0:8000/v1/health/ready'
Response
{
"object": "health.response",
"message": "Service is ready."
}
Check Service Liveness#
Use the following command to check service liveness.
cURL Request
curl -X 'GET' 'http://0.0.0.0:8000/v1/health/live'
Response
{
"object": "readyhealth.response",
"message": "Service is live."
}
OpenAI Chat Completions#
Use the following command to query the OpenAI chat completions endpoint.
cURL Request
curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta/llama-3.2-11b-vision-instruct",
"messages": [
{
"role":"user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url":
{
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
"max_tokens": 256
}'
Response
{
"id": "chat-8c5f5115fa464ab593963d5764498350",
"object": "chat.completion",
"created": 1729020253,
"model": "meta/llama-3.2-11b-vision-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant"
"content": "This image shows a boardwalk in a field of tall grass. ..."
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 17,
"total_tokens": 138,
"completion_tokens": 121
},
"prompt_logprobs": null
}
Llama Stack Chat Completion#
Use the following command to query the Llama Stack chat completion endpoint.
cURL Request
curl -X 'POST' \
'http://0.0.0.0:8000/inference/chat_completion' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta/llama-3.2-11b-vision-instruct",
"messages": [
{
"role":"user",
"content": [
{
"image":
{
"uri": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
},
"What is in this image?"
]
}
]
}'
Response
{
"completion_message": {
"role": "assistant",
"content": "This image shows a boardwalk in a field of tall grass. ...",
"stop_reason": "end_of_turn"
},
"logprobs": null
}