Reference#

OpenAI API#

You can download the complete API spec

Warning

Every model has a maximum token length. The models section lists the maximum token lengths of the supported models. See the truncate field in the Reference on ways to handle sequences longer than the maximum token length.

Dynamic batching#

Dynamic batching is a feature that allows the underlying Triton process in the NIM container to group one or more requests into a single batch, which can improve throughput under certain conditions, for example when serving many requests with small payloads. This feature is enabled by default and can be tuned by setting the NIM_TRITON_DYNAMIC_BATCHING_MAX_QUEUE_DELAY_MICROSECONDS environment variable. The default value is 100us (microseconds).

For more information on dynamic batching, refer to the Triton User Guide.

API Examples#

Use the examples in this section to help you get started with using the API.

The complete API spec can be found at Open AI Spec

List Models#

cURL Request

Use the following command to list the available models.

curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/models" \
-H 'Accept: application/json'

Response

{
  "object": "list",
  "data": [
    {
      "id": "nvidia/nv-rerankqa-mistral-4b-v3"
    }
  ]
}

Generate Rankings#

cURL Request

curl -X "POST" \
  "http://${HOSTNAME}:${SERVICE_PORT}/v1/ranking" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "nvidia/nv-rerankqa-mistral-4b-v3",
  "query": {"text": "which way should i go?"},
  "passages": [
    {"text": "two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;"},
    {"text": "then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,"},
    {"text": "and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back."},
    {"text": "i shall be telling this with a sigh somewhere ages and ages hense: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference."}
  ],
  "truncate": "END"
}'

Response

{
  "rankings": [
    {
      "index": 0,
      "logit": 0.7646484375
    },
    {
      "index": 3,
      "logit": -1.1044921875
    },
    {
      "index": 2,
      "logit": -2.71875
    },
    {
      "index": 1,
      "logit": -5.09765625
    }
  ]
}

Health Check#

cURL Request

Use the following command to query the health endpoints.

curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/ready" \
-H 'Accept: application/json'
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/live" \
-H 'Accept: application/json'

Response

{
  "ready": true
}
{
  "live": true
}

Reference#

Text Reranking API