api.proto¶

message InferRequestHeader¶

Meta-data for an inferencing request. The actual input data is delivered separate from this header, in the HTTP body for an HTTP request, or in the InferRequest message for a gRPC request.

message Input¶

Meta-data for an input tensor provided as part of an inferencing request.

string name¶: The name of the input tensor.

uint64 byte_size¶: The size of the input tensor, in bytes. This is the size for one instance of the input, not the entire size of a batched input.

message Output¶

Meta-data for a requested output tensor as part of an inferencing request.

string name¶: The name of the output tensor.

uint64 byte_size¶: The size of the output tensor, in bytes. This is the size for one instance of the output, not the entire size of a batched output.

message Class¶

Options for an output returned as a classification.

uint32 count¶: Indicates how many classification values should be returned for the output. The ‘count’ highest priority values are returned.

Class cls¶: Optional. If defined return this output as a classification instead of raw data. The output tensor will be interpreted as probabilities and the classifications associated with the highest probabilities will be returned.

uint32 batch_size¶: The batch size of the inference request. This must be >= 1. For models that don’t support batching batch_size must be 1.

Input input(repeated)¶: The input meta-data for the inputs provided with the the inference request.

Output output(repeated)¶: The output meta-data for the inputs provided with the the inference request.

message InferResponseHeader¶

Meta-data for the response to an inferencing request. The actual output data is delivered separate from this header, in the HTTP body for an HTTP request, or in the InferResponse message for a gRPC request.

message Output¶

Meta-data for an output tensor requested as part of an inferencing request.

string name¶: The name of the output tensor.

message Raw¶

Meta-data for an output tensor being returned as raw data.

uint64 byte_size¶: The size of the output tensor, in bytes. This is the size for one instance of the output, not the entire size of a batched output.

message Class¶

Information about each classification for this output.

int32 idx¶: The classification index.

float value¶: The classification value as a float (typically a probability).

string label¶: The label for the class (optional, only available if provided by the model).

message Classes¶

Meta-data for an output tensor being returned as classifications.

Class cls(repeated)¶: The topk classes for this output.

Raw raw¶: If specified deliver results for this output as raw tensor data. The actual output data is delivered in the HTTP body for an HTTP request, or in the InferResponse message for a gRPC request. Only one of ‘raw’ and ‘batch_classes’ may be specified.

Classes batch_classes(repeated)¶: If specified deliver results for this output as classifications. There is one Classes object for each batch entry in the output. Only one of ‘raw’ and ‘batch_classes’ may be specified.

string model_name¶: The name of the model that produced the outputs.

uint32 model_version¶: The version of the model that produced the outputs.

uint32 batch_size¶: The batch size of the outputs. This will always be equal to the batch size of the inputs. For models that don’t support batching the batch_size will be 1.

Output output(repeated)¶: The outputs, in the same order as they were requested in InferRequestHeader.