api.proto¶
- 
message 
InferRequestHeader¶ Meta-data for an inferencing request. The actual input data is delivered separate from this header, in the HTTP body for an HTTP request, or in the
InferRequestmessage for a gRPC request.- 
enum 
Flag¶ Flags that can be associated with an inference request. All flags are packed bitwise into the ‘flags’ field and so the value of each must be a power-of-2.
- 
enumerator 
Flag::FLAG_NONE= 0¶ Value indicating no flags are enabled.
- 
enumerator 
Flag::FLAG_SEQUENCE_START= 1 << 0¶ This request is the start of a related sequence of requests.
- 
enumerator 
Flag::FLAG_SEQUENCE_END= 1 << 1¶ This request is the end of a related sequence of requests.
- 
enumerator 
 
- 
message 
Input¶ Meta-data for an input tensor provided as part of an inferencing request.
- 
string 
name¶ The name of the input tensor.
- 
int64 
dims(repeated)¶ The shape of the input tensor, not including the batch dimension. Optional if the model configuration for this input explicitly specifies all dimensions of the shape. Required if the model configuration for this input has any wildcard dimensions (-1).
- 
uint64 
batch_byte_size¶ The size of the full batch of the input tensor, in bytes. Optional for tensors with fixed-sized datatypes. Required for tensors with a non-fixed-size datatype (like STRING).
- 
string 
 
- 
message 
Output¶ Meta-data for a requested output tensor as part of an inferencing request.
- 
string 
name¶ The name of the output tensor.
- 
string 
 
- 
uint64 
id¶ The ID of the inference request. The response of the request will have the same ID in InferResponseHeader. The request sender can use the ID to correlate the response to corresponding request if needed.
- 
uint32 
flags¶ The flags associated with this request. This field holds a bitwise-or of all flag values.
- 
uint64 
correlation_id¶ The correlation ID of the inference request. Default is 0, which indictes that the request has no correlation ID. The correlation ID is used to indicate two or more inference request are related to each other. How this relationship is handled by the inference server is determined by the model’s scheduling policy.
- 
uint32 
batch_size¶ The batch size of the inference request. This must be >= 1. For models that don’t support batching, batch_size must be 1.
- 
enum 
 
- 
message 
InferResponseHeader¶ Meta-data for the response to an inferencing request. The actual output data is delivered separate from this header, in the HTTP body for an HTTP request, or in the
InferResponsemessage for a gRPC request.- 
message 
Output¶ Meta-data for an output tensor requested as part of an inferencing request.
- 
string 
name¶ The name of the output tensor.
- 
message 
Raw¶ Meta-data for an output tensor being returned as raw data.
- 
int64 
dims(repeated)¶ The shape of the output tensor, not including the batch dimension.
- 
uint64 
batch_byte_size¶ The full size of the output tensor, in bytes. For a batch output, this is the size of the entire batch.
- 
int64 
 
- 
message 
Class¶ Information about each classification for this output.
- 
int32 
idx¶ The classification index.
- 
float 
value¶ The classification value as a float (typically a probability).
- 
string 
label¶ The label for the class (optional, only available if provided by the model).
- 
int32 
 
- 
message 
Classes¶ Meta-data for an output tensor being returned as classifications.
- 
Raw 
raw¶ If specified deliver results for this output as raw tensor data. The actual output data is delivered in the HTTP body for an HTTP request, or in the
InferResponsemessage for a gRPC request. Only one of ‘raw’ and ‘batch_classes’ may be specified.
- 
string 
 
- 
uint64 
id¶ The ID of the inference response. The response will have the same ID as the ID of its originated request. The request sender can use the ID to correlate the response to corresponding request if needed.
- 
string 
model_name¶ The name of the model that produced the outputs.
- 
int64 
model_version¶ The version of the model that produced the outputs.
- 
uint32 
batch_size¶ The batch size of the outputs. This will always be equal to the batch size of the inputs. For models that don’t support batching the batch_size will be 1.
- 
Output 
output(repeated)¶ The outputs, in the same order as they were requested in
InferRequestHeader.
- 
message