api.proto¶
-
message
InferRequestHeader
¶ Meta-data for an inferencing request. The actual input data is delivered separate from this header, in the HTTP body for an HTTP request, or in the
InferRequest
message for a gRPC request.-
enum
Flag
¶ Flags that can be associated with an inference request. All flags are packed bitwise into the ‘flags’ field and so the value of each must be a power-of-2.
-
enumerator
Flag
::
FLAG_NONE
= 0¶ Value indicating no flags are enabled.
-
enumerator
Flag
::
FLAG_SEQUENCE_START
= 1 << 0¶ This request is the start of a related sequence of requests.
-
enumerator
Flag
::
FLAG_SEQUENCE_END
= 1 << 1¶ This request is the end of a related sequence of requests.
-
enumerator
-
message
Input
¶ Meta-data for an input tensor provided as part of an inferencing request.
-
string
name
¶ The name of the input tensor.
-
int64
dims
(repeated)¶ The shape of the input tensor, not including the batch dimension. Optional if the model configuration for this input explicitly specifies all dimensions of the shape. Required if the model configuration for this input has any wildcard dimensions (-1).
-
uint64
batch_byte_size
¶ The size of the full batch of the input tensor, in bytes. Optional for tensors with fixed-sized datatypes. Required for tensors with a non-fixed-size datatype (like STRING).
-
string
-
message
Output
¶ Meta-data for a requested output tensor as part of an inferencing request.
-
string
name
¶ The name of the output tensor.
-
string
-
uint64
id
¶ The ID of the inference request. The response of the request will have the same ID in InferResponseHeader. The request sender can use the ID to correlate the response to corresponding request if needed.
-
uint32
flags
¶ The flags associated with this request. This field holds a bitwise-or of all flag values.
-
uint64
correlation_id
¶ The correlation ID of the inference request. Default is 0, which indictes that the request has no correlation ID. The correlation ID is used to indicate two or more inference request are related to each other. How this relationship is handled by the inference server is determined by the model’s scheduling policy.
-
uint32
batch_size
¶ The batch size of the inference request. This must be >= 1. For models that don’t support batching, batch_size must be 1.
-
enum
-
message
InferResponseHeader
¶ Meta-data for the response to an inferencing request. The actual output data is delivered separate from this header, in the HTTP body for an HTTP request, or in the
InferResponse
message for a gRPC request.-
message
Output
¶ Meta-data for an output tensor requested as part of an inferencing request.
-
string
name
¶ The name of the output tensor.
-
message
Raw
¶ Meta-data for an output tensor being returned as raw data.
-
int64
dims
(repeated)¶ The shape of the output tensor, not including the batch dimension.
-
uint64
batch_byte_size
¶ The full size of the output tensor, in bytes. For a batch output, this is the size of the entire batch.
-
int64
-
message
Class
¶ Information about each classification for this output.
-
int32
idx
¶ The classification index.
-
float
value
¶ The classification value as a float (typically a probability).
-
string
label
¶ The label for the class (optional, only available if provided by the model).
-
int32
-
message
Classes
¶ Meta-data for an output tensor being returned as classifications.
-
Raw
raw
¶ If specified deliver results for this output as raw tensor data. The actual output data is delivered in the HTTP body for an HTTP request, or in the
InferResponse
message for a gRPC request. Only one of ‘raw’ and ‘batch_classes’ may be specified.
-
string
-
uint64
id
¶ The ID of the inference response. The response will have the same ID as the ID of its originated request. The request sender can use the ID to correlate the response to corresponding request if needed.
-
string
model_name
¶ The name of the model that produced the outputs.
-
int64
model_version
¶ The version of the model that produced the outputs.
-
uint32
batch_size
¶ The batch size of the outputs. This will always be equal to the batch size of the inputs. For models that don’t support batching the batch_size will be 1.
-
Output
output
(repeated)¶ The outputs, in the same order as they were requested in
InferRequestHeader
.
-
message