api.proto

message InferRequestHeader
Meta-data for an inferencing request. The actual input data is delivered separate from this header, in the HTTP body for an HTTP request, or in the InferRequest message for a gRPC request.
message Input
Meta-data for an input tensor provided as part of an inferencing request.
string name

The name of the input tensor.

uint64 byte_size

The size of the input tensor, in bytes. This is the size for one instance of the input, not the entire size of a batched input.

message Output
Meta-data for a requested output tensor as part of an inferencing request.
string name

The name of the output tensor.

uint64 byte_size

The size of the output tensor, in bytes. This is the size for one instance of the output, not the entire size of a batched output.

message Class
Options for an output returned as a classification.
uint32 count

Indicates how many classification values should be returned for the output. The ‘count’ highest priority values are returned.

Class cls

Optional. If defined return this output as a classification instead of raw data. The output tensor will be interpreted as probabilities and the classifications associated with the highest probabilities will be returned.

uint32 batch_size

The batch size of the inference request. This must be >= 1. For models that don’t support batching batch_size must be 1.

Input input(repeated)

The input meta-data for the inputs provided with the the inference request.

Output output(repeated)

The output meta-data for the inputs provided with the the inference request.

message InferResponseHeader
Meta-data for the response to an inferencing request. The actual output data is delivered separate from this header, in the HTTP body for an HTTP request, or in the InferResponse message for a gRPC request.
message Output
Meta-data for an output tensor requested as part of an inferencing request.
string name

The name of the output tensor.

message Raw
Meta-data for an output tensor being returned as raw data.
uint64 byte_size

The size of the output tensor, in bytes. This is the size for one instance of the output, not the entire size of a batched output.

message Class
Information about each classification for this output.
int32 idx

The classification index.

float value

The classification value as a float (typically a probability).

string label

The label for the class (optional, only available if provided by the model).

message Classes
Meta-data for an output tensor being returned as classifications.
Class cls(repeated)

The topk classes for this output.

Raw raw

If specified deliver results for this output as raw tensor data. The actual output data is delivered in the HTTP body for an HTTP request, or in the InferResponse message for a gRPC request. Only one of ‘raw’ and ‘batch_classes’ may be specified.

Classes batch_classes(repeated)

If specified deliver results for this output as classifications. There is one Classes object for each batch entry in the output. Only one of ‘raw’ and ‘batch_classes’ may be specified.

string model_name

The name of the model that produced the outputs.

uint32 model_version

The version of the model that produced the outputs.

uint32 batch_size

The batch size of the outputs. This will always be equal to the batch size of the inputs. For models that don’t support batching the batch_size will be 1.

Output output(repeated)

The outputs, in the same order as they were requested in InferRequestHeader.