API Reference

OpenAPI Schema

The OpenAPI specification details the endpoints for NVIDIA NIM for VLMs:

/v1/models - List available models
/v1/health/ready - Health check
/v1/health/live - Service liveness check
/v1/chat/completions - OpenAI-compatible chat endpoint
/inference/chat_completion - Llama Stack compatible chat endpoint

API Examples

Use the examples in this section to help you get started with using the API.

List Models

cURL Request

Use the following command to list the available models.

curl -X 'GET' 'http://0.0.0.0:8000/v1/models'

Response

{
  "object": "list",
  "data": [
    {
      "id": "meta/llama-3.2-11b-vision-instruct",
      "object": "model",
      "created": 1724796510,
      "owned_by": "system",
      "root": "meta/llama-3.2-11b-vision-instruct",
      "parent": null,
      "max_model_len": 131072,
      "permission": [
        {
          "id": "modelperm-c2e069f426cc43088eb408f388578289",
          "object": "model_permission",
          "created": 1724796510,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

Check Health

Use the following command to check server health.

cURL Request

curl -X 'GET' 'http://0.0.0.0:8000/v1/health/ready'

Response

{
  "object": "health.response",
  "message": "Service is ready."
}

Check Service Liveness

Use the following command to check service liveness.

cURL Request

curl -X 'GET' 'http://0.0.0.0:8000/v1/health/live'

Response

{
  "object": "readyhealth.response",
  "message": "Service is live."
}

OpenAI Chat Completions

Use the following command to query the OpenAI chat completions endpoint.

cURL Request

curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "meta/llama-3.2-11b-vision-instruct",
        "messages": [
            {
                "role":"user",
                "content": [
                    {
                        "type": "text",
                        "text": "What is in this image?"
                    },
                    {
                        "type": "image_url",
                        "image_url":
                            {
                                "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                            }
                    }
                ]
            }
        ],
        "max_tokens": 256
    }'

Response

{
  "id": "chat-8c5f5115fa464ab593963d5764498350",
  "object": "chat.completion",
  "created": 1729020253,
  "model": "meta/llama-3.2-11b-vision-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant"
        "content": "This image shows a boardwalk in a field of tall grass. ..."
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 17,
    "total_tokens": 138,
    "completion_tokens": 121
  },
  "prompt_logprobs": null
}

Llama Stack Chat Completion

Use the following command to query the Llama Stack chat completion endpoint.

cURL Request

curl -X 'POST' \
'http://0.0.0.0:8000/ls/inference/chat_completion' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "meta/llama-3.2-11b-vision-instruct",
        "messages": [
            {
                "role":"user",
                "content": [
                    {
                        "image":
                            {
                                "uri": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                            }
                    },
                    "What is in this image?"
                ]
            }
        ]
    }'

Response

{
  "completion_message": {
    "role": "assistant",
    "content": "This image shows a boardwalk in a field of tall grass. ...",
    "stop_reason": "end_of_turn"
  },
  "logprobs": null
}

Reference

NIM for VLMs API