Parameters Extension#

This document describes Triton’s parameters extension. The parameters extension allows an inference request to provide custom parameters that cannot be provided as inputs. Because this extension is supported, Triton reports “parameters” in the extensions field of its Server Metadata. This extension uses the optional “parameters” field in the KServe Protocol in HTTP and GRPC.

The following parameters are reserved for Triton’s usage and should not be used as custom parameters:

  • sequence_id

  • priority

  • timeout

  • sequence_start

  • sequence_end

  • headers

  • All the keys that start with "triton_" prefix. Some examples used today:

    • "triton_enable_empty_final_response" request parameter

    • "triton_final_response" response parameter

When using both GRPC and HTTP endpoints, you need to make sure to not use the reserved parameters list to avoid unexpected behavior. The reserved parameters are not accessible in the Triton C-API.

HTTP/REST#

The following example shows how a request can include custom parameters.

POST /v2/models/mymodel/infer HTTP/1.1
Host: localhost:8000
Content-Type: application/json
Content-Length: <xx>
{
  "parameters" : { "my_custom_parameter" : 42 }
  "inputs" : [
    {
      "name" : "input0",
      "shape" : [ 2, 2 ],
      "datatype" : "UINT32",
      "data" : [ 1, 2, 3, 4 ]
    }
  ],
  "outputs" : [
    {
      "name" : "output0",
    }
  ]
}

GRPC#

The parameters field in the ModelInferRequest message can be used to send custom parameters.

Forwarding HTTP/GRPC Headers as Parameters#

Triton can forward HTTP/GRPC headers as inference request parameters. By specifying a regular expression in --http-header-forward-pattern and --grpc-header-forward-pattern, Triton will add the headers that match with the regular expression as request parameters. All the forwarded headers will be added as a parameter with string value. For example to forward all the headers that start with ‘PREFIX_’ from both HTTP and GRPC, you should add --http-header-forward-pattern PREFIX_.* --grpc-header-forward-pattern PREFIX_.* to your tritonserver command.

By default, the regular expression pattern matches headers with case-insensitive mode according to the HTTP protocol. If you want to enforce case-sensitive mode, simplying adding the (?-i) prefix which turns off case-insensitive mode, e.g. --http-header-forward-pattern (?-i)PREFIX_.*. Note, headers sent through the Python HTTP client may be automatically lower-cased by internal client libraries.

The forwarded headers can be accessed using the Python or C Backend APIs as inference request parameters.