Class InferContext¶
Defined in File request.h
Nested Relationships¶
Class Documentation¶
-
class
InferContext
¶ An InferContext object is used to run inference on an inference server for a specific model.
Once created an InferContext object can be used repeatedly to perform inference using the model. Options that control how inference is performed can be changed in between inference runs.
A InferContext object can use either HTTP protocol or GRPC protocol depending on the Create function (InferHttpContext::Create, InferGrpcContext::Create or InferGrpcStreamContext::Create). For example:
std::unique_ptr<InferContext> ctx; InferHttpContext::Create(&ctx, "localhost:8000", "mnist"); ... std::unique_ptr<Options> options0; Options::Create(&options0); options->SetBatchSize(b); options->AddClassResult(output, topk); ctx->SetRunOptions(*options0); ... ctx->Run(&results0); // run using options0 ctx->Run(&results1); // run using options0 ... std::unique_ptr<Options> options1; Options::Create(&options1); options->AddRawResult(output); ctx->SetRunOptions(*options); ... ctx->Run(&results2); // run using options1 ctx->Run(&results3); // run using options1 ...
- Note
InferContext::Create methods are thread-safe. All other InferContext methods, and nested class methods are not thread-safe.
The Run() calls are not thread-safe but a new Run() can be invoked as soon as the previous completes. The returned result objects are owned by the caller and may be retained and accessed even after the InferContext object is destroyed.
The AsyncRun() and Run() calls are not thread-safe. However, GetAsyncRunResults() is thread-safe.
For more parallelism multiple InferContext objects can access the same inference server with no serialization requirements across those objects.
Public Types
-
using
OnCompleteFn
= std::function<void(InferContext *, const std::shared_ptr<Request>&)>¶
Public Functions
-
virtual
~InferContext
() = 0¶
-
virtual const std::string &
ModelName
() const = 0¶ - Return
The name of the model being used for this context.
-
virtual int64_t
ModelVersion
() const = 0¶ - Return
The version of the model being used for this context. -1 indicates that the latest (i.e. highest version number) version of that model is being used.
-
virtual uint64_t
MaxBatchSize
() const = 0¶ - Return
The maximum batch size supported by the context. A maximum batch size indicates that the context does not support batching and so only a single inference at a time can be performed.
-
virtual CorrelationID
CorrelationId
() const = 0¶ - Return
The correlation ID associated with the context.
-
virtual const std::vector<std::shared_ptr<Input>> &
Inputs
() const = 0¶ - Return
The inputs of the model.
-
virtual const std::vector<std::shared_ptr<Output>> &
Outputs
() const = 0¶ - Return
The outputs of the model.
Get a named input.
Get a named output.
-
virtual Error
SetRunOptions
(const Options &options) = 0¶ Set the options to use for all subsequent Run() invocations.
- Return
Error object indicating success or failure.
- Parameters
options
: The options.
-
virtual int64_t
ByteSize
(const DimsList &shape, DataType dtype) const = 0¶ Get the byte size of an input or output given the shape and the datatype.
- Return
The size in bytes of this input/output. This is the size for one instance of the input/output, not the entire size of a batched input/output. When the byte-size is not known, for example for non-fixed-sized types like TYPE_STRING or for inputs/outputs with variable-size dimensions, this will return -1.
- Parameters
shape
: The shape of the input/output.dtype
: The datatype of the input/output.
-
virtual Error
GetStat
(Stat *stat) const = 0¶ Get the current statistics of the InferContext.
-
virtual Error
Run
(ResultMap *results) = 0¶ Send a synchronous request to the inference server to perform an inference to produce results for the outputs specified in the most recent call to SetRunOptions().
-
virtual Error
AsyncRun
(OnCompleteFn callback) = 0¶ Send an asynchronous request to the inference server to perform an inference on the server with options specified in the most recent call to SetRunOptions().
Once the request is completed, the InferContext pointer and the Request object will be passed to the provided ‘callback’ function. Upon the invocation of callback function, the ownership of Request object is transfered to the function caller. It is then the caller’s choice on either retrieving the results inside the callback function or deferring it to a different thread so that the InferContext is unblocked. See GetAsyncRunResults().
- Return
Error object indicating success or failure.
- Parameters
callback
: The callback function to be invoked on request completion
Get the results of the asynchronous request referenced by ‘async_request’.
- Return
Error object indicating success or failure.
- Parameters
async_request
: Request handle exposed to the ‘callback’ function of AsyncRun().results
: Returns Result objects holding inference results as a map from output name to Result object.
-
class
Input
¶ An input to the model.
Public Functions
-
virtual
~Input
() = 0¶
-
virtual const std::string &
Name
() const = 0¶ - Return
The name of the input.
-
virtual int64_t
ByteSize
() const = 0¶ - Return
The size in bytes of this input. This is the size for one instance of the input, not the entire size of a batched input. When the byte-size is not known, for example for non-fixed-sized types like TYPE_STRING or for inputs with variable-size dimensions, this will return -1.
-
virtual size_t
TotalByteSize
() const = 0¶ - Return
The size in bytes of entire batch of this input. For fixed-sized types this is just ByteSize() * batch-size, but for non-fixed-sized types like TYPE_STRING it is the only way to get the entire input size.
-
virtual ModelInput::Format
Format
() const = 0¶ - Return
The format of the input.
-
virtual const DimsList &
Dims
() const = 0¶ - Return
The dimensions/shape of the input specified in the model configuration. Variable-size dimensions are reported as -1.
-
virtual Error
Reset
() = 0¶ Prepare this input to receive new tensor values.
Forget any existing values that were set by previous calls to SetSharedMemory() or SetRaw().
- Return
Error object indicating success or failure.
-
virtual const std::vector<int64_t> &
Shape
() const = 0¶ Get the shape for this input that was most recently set by SetShape.
- Return
The shape, or empty vector if SetShape has not been called.
-
virtual Error
SetShape
(const std::vector<int64_t> &dims) = 0¶ Set the shape for this input.
The shape must be set for inputs that have variable-size dimensions and is optional for other inputs. The shape must be set before calling SetRaw or SetFromString.
- Return
Error object indicating success or failure.
- Parameters
dims
: The dimensions of the shape.
-
virtual Error
SetRaw
(const uint8_t *input, size_t input_byte_size) = 0¶ Set tensor values for this input from a byte array.
The array is not copied and so it must not be modified or destroyed until this input is no longer needed (that is until the Run() call(s) that use the input have completed). For batched inputs this function must be called batch-size times to provide all tensor values for a batch of this input.
- Return
Error object indicating success or failure.
- Parameters
input
: The pointer to the array holding the tensor value.input_byte_size
: The size of the array in bytes, must match the size expected by the input.
-
virtual Error
SetRaw
(const std::vector<uint8_t> &input) = 0¶ Set tensor values for this input from a byte vector.
The vector is not copied and so it must not be modified or destroyed until this input is no longer needed (that is until the Run() call(s) that use the input have completed). For batched inputs this function must be called batch-size times to provide all tensor values for a batch of this input.
- Return
Error object indicating success or failure.
- Parameters
input
: The vector holding tensor values.
-
virtual Error
SetFromString
(const std::vector<std::string> &input) = 0¶ Set tensor values for this input from a vector or strings.
This method can only be used for tensors with STRING data-type. The strings are assigned in row-major order to the elements of the tensor. The strings are copied and so the ‘input’ does not need to be preserved as with SetRaw(). For batched inputs this function must be called batch-size times to provide all tensor values for a batch of this input.
- Return
Error object indicating success or failure.
- Parameters
input
: The vector holding tensor string values.
Set tensor values for this input by reference into a shared memory region.
The values are not copied and so the shared memory region and its contents must not be modified or destroyed until this input is no longer needed (that is until the Run() call(s) that use the input have completed. This function must be called a single time for an input that is using shared memory. For batched inputs, the tensor values for the entire batch must be contiguous in a single shared memory region. Note: The options must be set using SetRunOptions before calling the SetSharedMemory function since the batch size is needed for validation.
- Return
Error object indicating success or failure.
- Parameters
name
: The user-given name for the registered shared memory region where the tensor values for this input is stored.offset
: The offset into the shared memory region upto the start of the input tensor values.byte_size
: The size, in bytes of the input tensor data. Must match the size expected for the input batch.
-
virtual
-
class
Options
¶ Run options to be applied to all subsequent Run() invocations.
Public Functions
-
virtual
~Options
() = 0¶
-
virtual bool
Flag
(InferRequestHeader::Flag flag) const = 0¶ Get the value of a request flag being used for all subsequent inferences.
Cannot be used with FLAG_NONE.
- Return
The true/false value currently set for the flag. If ‘flag’ is FLAG_NONE then return false.
- Parameters
flag
: The flag to get the value for.
-
virtual void
SetFlag
(InferRequestHeader::Flag flag, bool value) = 0¶ Set a request flag to be used for all subsequent inferences.
- Parameters
flag
: The flag to set. Cannot be used with FLAG_NONE.value
: The true/false value to set for the flag. If ‘flag’ is FLAG_NONE then do nothing.
-
virtual uint32_t
Flags
() const = 0¶ Get the value of all request flags being used for all subsequent inferences.
- Return
The bitwise-or of flag values as a single uint32_t value.
-
virtual void
SetFlags
(uint32_t flags) = 0¶ Set all request flags to be used for all subsequent inferences.
- Parameters
flags
: The bitwise-or of flag values to set.
-
virtual size_t
BatchSize
() const = 0¶ - Return
The batch size to use for all subsequent inferences.
-
virtual void
SetBatchSize
(size_t batch_size) = 0¶ Set the batch size to use for all subsequent inferences.
- Parameters
batch_size
: The batch size.
-
virtual CorrelationID
CorrelationId
() const = 0¶ - Return
The correlation id for the inferences.
-
virtual void
SetCorrelationId
(CorrelationID correlation_id) = 0¶ Set the correlation id to use for subsequent inferences.
- Parameters
correlation_id
: The value of correlation id to be set. If non-zero this correlation ID overrides the context’s correlation ID for all subsequent inference requests, else the context retains its current correlation ID.
Add ‘output’ to the list of requested RAW results.
Run() will return the output’s full tensor as a result.
- Return
Error object indicating success or failure.
- Parameters
output
: The output.
Add ‘output’ to the list of requested CLASS results.
Run() will return the highest ‘k’ values of ‘output’ as a result.
- Return
Error object indicating success or failure.
- Parameters
output
: The output.k
: Set how many class results to return for the output.
Indicate that the result values for this output should be placed in a shared memory region instead of being returned in the inference response.
The shared memory region must not be modified or destroyed written the output completely). For batched outputs, all tensor values are copied into a contiguous space in a single shared memory region.
- Return
Error object indicating success or failure.
- Parameters
output
: The output.name
: The user-given name for the registered shared memory region where the tensor values for this output should be stored.offset
: The offset into the shared memory region upto the start of the output tensor values.byte_size
: The size, in bytes of the output tensor data. Must match the size expected by the output.
-
virtual
-
class
Output
¶ An output from the model.
-
class
Request
¶ Handle to a inference request.
The request handle is used to get request results if the request is sent by AsyncRun().
-
class
Result
¶ An inference result corresponding to an output.
Public Types
Public Functions
-
virtual
~Result
() = 0¶
-
virtual const std::string &
ModelName
() const = 0¶ - Return
The name of the model that produced this result.
-
virtual int64_t
ModelVersion
() const = 0¶ - Return
The version of the model that produced this result.
-
virtual const std::shared_ptr<Output>
GetOutput
() const = 0¶ - Return
The Output object corresponding to this result.
-
virtual Error
GetRawShape
(std::vector<int64_t> *shape) const = 0¶ Get the shape of a raw result.
The shape does not include the batch dimension.
- Return
Error object indicating success or failure.
- Parameters
shape
: Returns the shape.
-
virtual Error
GetRaw
(size_t batch_idx, const std::vector<uint8_t> **buf) const = 0¶ Get a reference to entire raw result data for a specific batch entry.
Returns error if this result is not RAW format. WARNING: This call may require creation of a copy of the result data. To avoid this potential copy overhead use GetRaw(size_t, const uint8_t**, size_t*).
- Return
Error object indicating success or failure.
- Parameters
batch_idx
: Returns the results for this entry of the batch.buf
: Returns the vector of result bytes.
-
virtual Error
GetRaw
(size_t batch_idx, const uint8_t **buf, size_t *byte_size) const = 0¶ Get a reference to entire raw result data for a specific batch entry.
Returns error if this result is not RAW format.
- Return
Error object indicating success or failure.
- Parameters
batch_idx
: Returns the results for this entry of the batch.buf
: Returns pointer to the buffer holding result bytes.byte_size
: Returns the size of the result buffer, in bytes.
-
virtual Error
GetRawAtCursor
(size_t batch_idx, const uint8_t **buf, size_t adv_byte_size) = 0¶ Get a reference to raw result data for a specific batch entry at the current “cursor” and advance the cursor by the specified number of bytes.
More typically use GetRawAtCursor<T>() method to return the data as a specific type T. Use ResetCursor() to reset the cursor to the beginning of the result. Returns error if this result is not RAW format.
- Return
Error object indicating success or failure.
- Parameters
batch_idx
: Returns results for this entry of the batch.buf
: Returns pointer to ‘adv_byte_size’ bytes of data.adv_byte_size
: The number of bytes of data to get a reference to.
-
template<typename
T
>
ErrorGetRawAtCursor
(size_t batch_idx, T *out)¶ Read a value for a specific batch entry at the current “cursor” from the result tensor as the specified type T and advance the cursor.
Use ResetCursor() to reset the cursor to the beginning of the result. Returns error if this result is not RAW format.
- Return
Error object indicating success or failure.
- Parameters
batch_idx
: Returns results for this entry of the batch.out
: Returns the value at the cursor.
-
virtual Error
GetClassCount
(size_t batch_idx, size_t *cnt) const = 0¶ Get the number of class results for a batch.
Returns error if this result is not CLASS format.
- Return
Error object indicating success or failure.
- Parameters
batch_idx
: The index in the batch.cnt
: Returns the number of ClassResult entries for the batch entry.
-
virtual Error
GetClassAtCursor
(size_t batch_idx, ClassResult *result) = 0¶ Get the ClassResult result for a specific batch entry at the current cursor.
Use ResetCursor() to reset the cursor to the beginning of the result. Returns error if this result is not CLASS format.
- Return
Error object indicating success or failure.
- Parameters
batch_idx
: The index in the batch.result
: Returns the ClassResult value for the batch at the cursor.
-
virtual Error
ResetCursors
() = 0¶ Reset cursor to beginning of result for all batch entries.
- Return
Error object indicating success or failure.
-
struct
ClassResult
¶ The result value for CLASS format results.
-
virtual
-
struct
Stat
¶ Cumulative statistic of the InferContext.
- Note
For GRPC protocol, ‘cumulative_send_time_ns’ represents the time for marshaling infer request. ‘cumulative_receive_time_ns’ represents the time for unmarshaling infer response.
Public Members
-
size_t
completed_request_count
¶ Total number of requests completed.
-
uint64_t
cumulative_total_request_time_ns
¶ Time from the request start until the response is completely received.
-
uint64_t
cumulative_send_time_ns
¶ Time from the request start until the last byte is sent.
-
uint64_t
cumulative_receive_time_ns
¶ Time from receiving first byte of the response until the response is completely received.