API Reference

OpenAPI Schema

The OpenAPI specification details the endpoints for NVIDIA NIM for VLMs:

  • /v1/models - List available models

  • /v1/health/ready - Health check

  • /v1/health/live - Service liveness check

  • /v1/chat/completions - OpenAI-compatible chat endpoint

  • /inference/chat_completion - Llama Stack compatible chat endpoint

API Examples

Use the examples in this section to help you get started with using the API.

List Models

cURL Request

Use the following command to list the available models.

curl -X 'GET' 'http://0.0.0.0:8000/v1/models'

Response

{
  "object": "list",
  "data": [
    {
      "id": "meta/llama-3.2-11b-vision-instruct",
      "object": "model",
      "created": 1724796510,
      "owned_by": "system",
      "root": "meta/llama-3.2-11b-vision-instruct",
      "parent": null,
      "max_model_len": 131072,
      "permission": [
        {
          "id": "modelperm-c2e069f426cc43088eb408f388578289",
          "object": "model_permission",
          "created": 1724796510,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

Check Health

Use the following command to check server health.

cURL Request

curl -X 'GET' 'http://0.0.0.0:8000/v1/health/ready'

Response

{
  "object": "health.response",
  "message": "Service is ready."
}

Check Service Liveness

Use the following command to check service liveness.

cURL Request

curl -X 'GET' 'http://0.0.0.0:8000/v1/health/live'

Response

{
  "object": "readyhealth.response",
  "message": "Service is live."
}

OpenAI Chat Completions

Use the following command to query the OpenAI chat completions endpoint.

cURL Request

curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "meta/llama-3.2-11b-vision-instruct",
        "messages": [
            {
                "role":"user",
                "content": [
                    {
                        "type": "text",
                        "text": "What is in this image?"
                    },
                    {
                        "type": "image_url",
                        "image_url":
                            {
                                "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                            }
                    }
                ]
            }
        ],
        "max_tokens": 256
    }'

Response

{
  "id": "chat-8c5f5115fa464ab593963d5764498350",
  "object": "chat.completion",
  "created": 1729020253,
  "model": "meta/llama-3.2-11b-vision-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant"
        "content": "This image shows a boardwalk in a field of tall grass. ..."
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 17,
    "total_tokens": 138,
    "completion_tokens": 121
  },
  "prompt_logprobs": null
}

Llama Stack Chat Completion

Use the following command to query the Llama Stack chat completion endpoint.

cURL Request

curl -X 'POST' \
'http://0.0.0.0:8000/ls/inference/chat_completion' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "meta/llama-3.2-11b-vision-instruct",
        "messages": [
            {
                "role":"user",
                "content": [
                    {
                        "image":
                            {
                                "uri": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                            }
                    },
                    "What is in this image?"
                ]
            }
        ]
    }'

Response

{
  "completion_message": {
    "role": "assistant",
    "content": "This image shows a boardwalk in a field of tall grass. ...",
    "stop_reason": "end_of_turn"
  },
  "logprobs": null
}

Reference

NIM for VLMs API