api.proto¶
- The meta-data for the shared memory from which to read the input data and/or write the output data. - The name given during registration of a shared memory region that holds the input data (or where the output data should be written). 
 - The offset from the start of the shared memory region. start = offset, end = offset + size; 
 - Size of the memory block, in bytes. 
 
- 
message InferRequestHeader¶
- Meta-data for an inferencing request. The actual input data is delivered separate from this header, in the HTTP body for an HTTP request, or in the - InferRequestmessage for a gRPC request.- 
enum Flag¶
- Flags that can be associated with an inference request. All flags are packed bitwise into the ‘flags’ field and so the value of each must be a power-of-2. - 
enumerator Flag::FLAG_NONE= 0¶
- Value indicating no flags are enabled. 
 - 
enumerator Flag::FLAG_SEQUENCE_START= 1 << 0¶
- This request is the start of a related sequence of requests. 
 - 
enumerator Flag::FLAG_SEQUENCE_END= 1 << 1¶
- This request is the end of a related sequence of requests. 
 
- 
enumerator 
 - 
message Input¶
- Meta-data for an input tensor provided as part of an inferencing request. - 
string name¶
- The name of the input tensor. 
 - 
int64 dims(repeated)¶
- The shape of the input tensor, not including the batch dimension. Optional if the model configuration for this input explicitly specifies all dimensions of the shape. Required if the model configuration for this input has any wildcard dimensions (-1). 
 - 
uint64 batch_byte_size¶
- The size of the full batch of the input tensor, in bytes. Optional for tensors with fixed-sized datatypes. Required for tensors with a non-fixed-size datatype (like STRING). 
 - It is the location in shared memory that contains the tensor data for this input. Using shared memory is optional but if this message is used, all fields are required. 
 
- 
string 
 - 
message Output¶
- Meta-data for a requested output tensor as part of an inferencing request. - 
string name¶
- The name of the output tensor. 
 - 
message Class¶
- Options for an output returned as a classification. - 
uint32 count¶
- Indicates how many classification values should be returned for the output. The ‘count’ highest priority values are returned. 
 
- 
uint32 
 - 
Class cls¶
- Optional. If defined return this output as a classification instead of raw data. The output tensor will be interpreted as probabilities and the classifications associated with the highest probabilities will be returned. 
 - It is the location in shared memory that the result tensor data for this output will be written. Using shared memory is optional but if this message is used, all fields are required. 
 
- 
string 
 - 
uint64 id¶
- The ID of the inference request. The response of the request will have the same ID in InferResponseHeader. The request sender can use the ID to correlate the response to corresponding request if needed. 
 - 
uint32 flags¶
- The flags associated with this request. This field holds a bitwise-or of all flag values. 
 - 
uint64 correlation_id¶
- The correlation ID of the inference request. Default is 0, which indictes that the request has no correlation ID. The correlation ID is used to indicate two or more inference request are related to each other. How this relationship is handled by the inference server is determined by the model’s scheduling policy. 
 - 
uint32 batch_size¶
- The batch size of the inference request. This must be >= 1. For models that don’t support batching, batch_size must be 1. 
 
- 
enum 
- 
message InferResponseHeader¶
- Meta-data for the response to an inferencing request. The actual output data is delivered separate from this header, in the HTTP body for an HTTP request, or in the - InferResponsemessage for a gRPC request.- 
message Output¶
- Meta-data for an output tensor requested as part of an inferencing request. - 
string name¶
- The name of the output tensor. 
 - 
message Raw¶
- Meta-data for an output tensor being returned as raw data. - 
int64 dims(repeated)¶
- The shape of the output tensor, not including the batch dimension. 
 - 
uint64 batch_byte_size¶
- The full size of the output tensor, in bytes. For a batch output, this is the size of the entire batch. 
 
- 
int64 
 - 
message Class¶
- Information about each classification for this output. - 
int32 idx¶
- The classification index. 
 - 
float value¶
- The classification value as a float (typically a probability). 
 - 
string label¶
- The label for the class (optional, only available if provided by the model). 
 
- 
int32 
 - 
message Classes¶
- Meta-data for an output tensor being returned as classifications. 
 - 
Raw raw¶
- If specified deliver results for this output as raw tensor data. The actual output data is delivered in the HTTP body for an HTTP request, or in the - InferResponsemessage for a gRPC request. Only one of ‘raw’ and ‘batch_classes’ may be specified.
 
- 
string 
 - 
uint64 id¶
- The ID of the inference response. The response will have the same ID as the ID of its originated request. The request sender can use the ID to correlate the response to corresponding request if needed. 
 - 
string model_name¶
- The name of the model that produced the outputs. 
 - 
int64 model_version¶
- The version of the model that produced the outputs. 
 - 
uint32 batch_size¶
- The batch size of the outputs. This will always be equal to the batch size of the inputs. For models that don’t support batching the batch_size will be 1. 
 - 
Output output(repeated)¶
- The outputs, in the same order as they were requested in - InferRequestHeader.
 
- 
message