Schedule Policy Extension#

This document describes Triton’s schedule policy extension. The schedule-policy extension allows an inference request to provide parameters that influence how Triton handles and schedules the request. Because this extension is supported, Triton reports “schedule_policy” in the extensions field of its Server Metadata. Note the policies are specific to dynamic batcher and not sequence batcher with the direct scheduling strategy.

The schedule-policy extension uses request parameters to indicate the policy. The parameters and their type are:

  • “priority” : int64 value indicating the priority of the request. Priority value zero indicates that the default priority level should be used (i.e. same behavior as not specifying the priority parameter). Lower value priorities indicate higher priority levels. Thus the highest priority level is indicated by setting the parameter to 1, the next highest is 2, etc.

  • “timeout” : int64 value indicating the timeout value for the request, in microseconds. If the request cannot be completed within the time Triton will take a model-specific action such as terminating the request.

Both parameters are optional and if not specified Triton will handle the request using the default priority and timeout values appropriate for the model.