
message InferSharedMemory

The meta-data for the shared memory from which to read the input data and/or write the output data.

string name

The name given during registration of a shared memory region that holds the input data (or where the output data should be written).

uint64 offset

The offset from the start of the shared memory region. start = offset, end = offset + size;

uint64 byte_size

Size of the memory block, in bytes.

message InferRequestHeader

Meta-data for an inferencing request. The actual input data is delivered separate from this header, in the HTTP body for an HTTP request, or in the InferRequest message for a gRPC request.

enum Flag

Flags that can be associated with an inference request. All flags are packed bitwise into the ‘flags’ field and so the value of each must be a power-of-2.

enumerator Flag::FLAG_NONE = 0

Value indicating no flags are enabled.

enumerator Flag::FLAG_SEQUENCE_START = 1 << 0

This request is the start of a related sequence of requests.

enumerator Flag::FLAG_SEQUENCE_END = 1 << 1

This request is the end of a related sequence of requests.

message Input

Meta-data for an input tensor provided as part of an inferencing request.

string name

The name of the input tensor.

int64 dims(repeated)

The shape of the input tensor, not including the batch dimension. Optional if the model configuration for this input explicitly specifies all dimensions of the shape. Required if the model configuration for this input has any wildcard dimensions (-1).

uint64 batch_byte_size

The size of the full batch of the input tensor, in bytes. Optional for tensors with fixed-sized datatypes. Required for tensors with a non-fixed-size datatype (like STRING).

InferSharedMemory shared_memory

It is the location in shared memory that contains the tensor data for this input. Using shared memory is optional but if this message is used, all fields are required.

message Output

Meta-data for a requested output tensor as part of an inferencing request.

string name

The name of the output tensor.

message Class

Options for an output returned as a classification.

uint32 count

Indicates how many classification values should be returned for the output. The ‘count’ highest priority values are returned.

Class cls

Optional. If defined return this output as a classification instead of raw data. The output tensor will be interpreted as probabilities and the classifications associated with the highest probabilities will be returned.

InferSharedMemory shared_memory

It is the location in shared memory that the result tensor data for this output will be written. Using shared memory is optional but if this message is used, all fields are required.

uint64 id

The ID of the inference request. The response of the request will have the same ID in InferResponseHeader. The request sender can use the ID to correlate the response to corresponding request if needed.

uint32 flags

The flags associated with this request. This field holds a bitwise-or of all flag values.

uint64 correlation_id

The correlation ID of the inference request. Default is 0, which indictes that the request has no correlation ID. The correlation ID is used to indicate two or more inference request are related to each other. How this relationship is handled by the inference server is determined by the model’s scheduling policy.

uint32 batch_size

The batch size of the inference request. This must be >= 1. For models that don’t support batching, batch_size must be 1.

Input input(repeated)

The input meta-data for the inputs provided with the the inference request.

Output output(repeated)

The output meta-data for the inputs provided with the the inference request.

uint32 priority

The priority value of this request. If priority handling is not enable for the model, then this value is ignored. The default value is 0 which indicates that the request will be assigned the default priority associated with the model.

uint64 timeout_microseconds

The timeout for this request. This value overrides the timeout specified by the model, if the model allows timeout override and if the value is less than the default timeout specified by the model. If the request cannot be processed within this timeout, the request will be handled based on the model’s timeout policy. Note that request for ensemble model cannot override the timeout values for the composing models. The default value is 0 which indicates that the request does not override the model’s timeout value.

message InferResponseHeader

Meta-data for the response to an inferencing request. The actual output data is delivered separate from this header, in the HTTP body for an HTTP request, or in the InferResponse message for a gRPC request.

message Output

Meta-data for an output tensor requested as part of an inferencing request.

string name

The name of the output tensor.

DataType data_type

The datatype of the output tensor.

message Raw

Meta-data for an output tensor being returned as raw data.

int64 dims(repeated)

The shape of the output tensor, not including the batch dimension.

uint64 batch_byte_size

The full size of the output tensor, in bytes. For a batch output, this is the size of the entire batch.

message Class

Information about each classification for this output.

int32 idx

The classification index.

float value

The classification value as a float (typically a probability).

string label

The label for the class (optional, only available if provided by the model).

message Classes

Meta-data for an output tensor being returned as classifications.

Class cls(repeated)

The topk classes for this output.

Raw raw

If specified deliver results for this output as raw tensor data. The actual output data is delivered in the HTTP body for an HTTP request, or in the InferResponse message for a gRPC request. Only one of ‘raw’ and ‘batch_classes’ may be specified.

Classes batch_classes(repeated)

If specified deliver results for this output as classifications. There is one Classes object for each batch entry in the output. Only one of ‘raw’ and ‘batch_classes’ may be specified.

uint64 id

The ID of the inference response. The response will have the same ID as the ID of its originated request. The request sender can use the ID to correlate the response to corresponding request if needed.

string model_name

The name of the model that produced the outputs.

int64 model_version

The version of the model that produced the outputs.

uint32 batch_size

The batch size of the outputs. This will always be equal to the batch size of the inputs. For models that don’t support batching the batch_size will be 1.

Output output(repeated)

The outputs, in the same order as they were requested in InferRequestHeader.