api.proto ==================== .. cpp:namespace:: nvidia::inferenceserver .. cpp:var:: message InferRequestHeader Meta-data for an inferencing request. The actual input data is delivered separate from this header, in the HTTP body for an HTTP request, or in the :cpp:var:`InferRequest` message for a gRPC request. .. cpp:var:: message Input Meta-data for an input tensor provided as part of an inferencing request. .. cpp:var:: string name The name of the input tensor. .. cpp:var:: uint64 byte_size The size of the input tensor, in bytes. This is the size for one instance of the input, not the entire size of a batched input. .. cpp:var:: message Output Meta-data for a requested output tensor as part of an inferencing request. .. cpp:var:: string name The name of the output tensor. .. cpp:var:: uint64 byte_size The size of the output tensor, in bytes. This is the size for one instance of the output, not the entire size of a batched output. .. cpp:var:: message Class Options for an output returned as a classification. .. cpp:var:: uint32 count Indicates how many classification values should be returned for the output. The 'count' highest priority values are returned. .. cpp:var:: Class cls Optional. If defined return this output as a classification instead of raw data. The output tensor will be interpreted as probabilities and the classifications associated with the highest probabilities will be returned. .. cpp:var:: uint32 batch_size The batch size of the inference request. This must be >= 1. For models that don't support batching batch_size must be 1. .. cpp:var:: Input input (repeated) The input meta-data for the inputs provided with the the inference request. .. cpp:var:: Output output (repeated) The output meta-data for the inputs provided with the the inference request. .. cpp:var:: message InferResponseHeader Meta-data for the response to an inferencing request. The actual output data is delivered separate from this header, in the HTTP body for an HTTP request, or in the :cpp:var:`InferResponse` message for a gRPC request. .. cpp:var:: message Output Meta-data for an output tensor requested as part of an inferencing request. .. cpp:var:: string name The name of the output tensor. .. cpp:var:: message Raw Meta-data for an output tensor being returned as raw data. .. cpp:var:: uint64 byte_size The size of the output tensor, in bytes. This is the size for one instance of the output, not the entire size of a batched output. .. cpp:var:: message Class Information about each classification for this output. .. cpp:var:: int32 idx The classification index. .. cpp:var:: float value The classification value as a float (typically a probability). .. cpp:var:: string label The label for the class (optional, only available if provided by the model). .. cpp:var:: message Classes Meta-data for an output tensor being returned as classifications. .. cpp:var:: Class cls (repeated) The topk classes for this output. .. cpp:var:: Raw raw If specified deliver results for this output as raw tensor data. The actual output data is delivered in the HTTP body for an HTTP request, or in the :cpp:var:`InferResponse` message for a gRPC request. Only one of 'raw' and 'batch_classes' may be specified. .. cpp:var:: Classes batch_classes (repeated) If specified deliver results for this output as classifications. There is one :cpp:var:`Classes` object for each batch entry in the output. Only one of 'raw' and 'batch_classes' may be specified. .. cpp:var:: string model_name The name of the model that produced the outputs. .. cpp:var:: uint32 model_version The version of the model that produced the outputs. .. cpp:var:: uint32 batch_size The batch size of the outputs. This will always be equal to the batch size of the inputs. For models that don't support batching the batch_size will be 1. .. cpp:var:: Output output (repeated) The outputs, in the same order as they were requested in :cpp:var:`InferRequestHeader`.