Class InferContext¶
- Defined in File request.h
Nested Relationships¶
Inheritance Relationships¶
Derived Types¶
public nvidia::inferenceserver::client::InferGrpcContext
(Class InferGrpcContext)public nvidia::inferenceserver::client::InferHttpContext
(Class InferHttpContext)
Class Documentation¶
-
class
InferContext
¶ An InferContext object is used to run inference on an inference server for a specific model.
Once created an InferContext object can be used repeatedly to perform inference using the model. Options that control how inference is performed can be changed in between inference runs.
A InferContext object can use either HTTP protocol or gRPC protocol depending on the Create function (InferHttpContext::Create or InferGrpcContext::Create). For example:
std::unique_ptr<InferContext> ctx; InferHttpContext::Create(&ctx, "localhost:8000", "mnist"); ... std::unique_ptr<Options> options0; Options::Create(&options0); options->SetBatchSize(b); options->AddClassResult(output, topk); ctx->SetRunOptions(*options0); ... ctx->Run(&results0); // run using options0 ctx->Run(&results1); // run using options0 ... std::unique_ptr<Options> options1; Options::Create(&options1); options->AddRawResult(output); ctx->SetRunOptions(*options); ... ctx->Run(&results2); // run using options1 ctx->Run(&results3); // run using options1 ...
- Note
- InferContext::Create methods are thread-safe. All other InferContext methods, and nested class methods are not thread-safe.
- The Run() calls are not thread-safe but a new Run() can be invoked as soon as the previous completes. The returned result objects are owned by the caller and may be retained and accessed even after the InferContext object is destroyed.
- AsyncRun() and GetAsyncRunStatus() calls are not thread-safe. What’s more, calling one method while the other one is running will result in undefined behavior given that they will modify the shared data internally.
- For more parallelism multiple InferContext objects can access the same inference server with no serialization requirements across those objects.
Subclassed by nvidia::inferenceserver::client::InferGrpcContext, nvidia::inferenceserver::client::InferHttpContext
Public Functions
-
virtual
~InferContext
()¶ Destroy the inference context.
-
const std::string &
ModelName
() const¶ - Return
- The name of the model being used for this context.
-
int
ModelVersion
() const¶ - Return
- The version of the model being used for this context. -1 indicates that the latest (i.e. highest version number) version of that model is being used.
-
uint64_t
MaxBatchSize
() const¶ - Return
- The maximum batch size supported by the context. A maximum batch size indicates that the context does not support batching and so only a single inference at a time can be performed.
Get a named input.
Get a named output.
-
Error
SetRunOptions
(const Options &options)¶ Set the options to use for all subsequent Run() invocations.
- Return
- Error object indicating success or failure.
- Parameters
options
: The options.
-
Error
GetStat
(Stat *stat)¶ Get the current statistics of the InferContext.
-
virtual Error
Run
(std::vector<std::unique_ptr<Result>> *results) = 0¶ Send a synchronous request to the inference server to perform an inference to produce results for the outputs specified in the most recent call to SetRunOptions().
The Result objects holding the output values are returned in the same order as the outputs are specified in the options.
Send an asynchronous request to the inference server to perform an inference to produce results for the outputs specified in the most recent call to SetRunOptions().
Get the results of the asynchronous request referenced by ‘async_request’.
The Result objects holding the output values are returned in the same order as the outputs are specified in the options when AsyncRun() was called.
- Return
- Error object indicating success or failure. Success will be returned only if the request has been completed succesfully. UNAVAILABLE will be returned if ‘wait’ is false and the request is not ready.
- Parameters
Get any one completed asynchronous request.
- Return
- Error object indicating success or failure. Success will be returned only if a completed request was returned.. UNAVAILABLE will be returned if ‘wait’ is false and no request is ready.
- Parameters
async_request
: Returns the Request object holding the completed request.wait
: If true, block until the request completes. Otherwise, return immediately.
Protected Functions
-
InferContext
(const std::string&, int, bool)¶
-
virtual void
AsyncTransfer
() = 0¶
-
Error
UpdateStat
(const RequestTimers &timer)¶
Protected Attributes
-
AsyncReqMap
ongoing_async_requests_
¶
-
const std::string
model_name_
¶
-
const int
model_version_
¶
-
const bool
verbose_
¶
-
uint64_t
max_batch_size_
¶
-
uint64_t
total_input_byte_size_
¶
-
uint64_t
batch_size_
¶
-
uint64_t
async_request_id_
¶
-
InferRequestHeader
infer_request_
¶
-
std::thread
worker_
¶
-
std::mutex
mutex_
¶
-
std::condition_variable
cv_
¶
-
bool
exiting_
¶
-
class
Input
¶ An input to the model.
Public Functions
-
virtual
~Input
()¶ Destroy the input.
-
virtual const std::string &
Name
() const = 0¶ - Return
- The name of the input.
-
virtual size_t
ByteSize
() const = 0¶ - Return
- The size in bytes of this input. This is the size for one instance of the input, not the entire size of a batched input.
-
virtual ModelInput::Format
Format
() const = 0¶ - Return
- The format of the input.
-
virtual const DimsList &
Dims
() const = 0¶ - Return
- The dimensions/shape of the input.
-
virtual Error
Reset
() = 0¶ Prepare this input to receive new tensor values.
Forget any existing values that were set by previous calls to SetRaw().
- Return
- Error object indicating success or failure.
-
virtual Error
SetRaw
(const uint8_t *input, size_t input_byte_size) = 0¶ Set tensor values for this input from a byte array.
The array is not copied and so it must not be modified or destroyed until this input is no longer needed (that is until the Run() call(s) that use the input have completed). For batched inputs this function must be called batch-size times to provide all tensor values for a batch of this input.
- Return
- Error object indicating success or failure.
- Parameters
input
: The pointer to the array holding the tensor value.input_byte_size
: The size of the array in bytes, must match the size expected by the input.
-
virtual Error
SetRaw
(const std::vector<uint8_t> &input) = 0¶ Set tensor values for this input from a byte vector.
The vector is not copied and so it must not be modified or destroyed until this input is no longer needed (that is until the Run() call(s) that use the input have completed). For batched inputs this function must be called batch-size times to provide all tensor values for a batch of this input.
- Return
- Error object indicating success or failure.
- Parameters
input
: The vector holding tensor values.
-
virtual
-
class
Options
¶ Run options to be applied to all subsequent Run() invocations.
Public Functions
-
virtual
~Options
()¶
-
virtual size_t
BatchSize
() const = 0¶ - Return
- The batch size to use for all subsequent inferences.
-
virtual void
SetBatchSize
(size_t batch_size) = 0¶ Set the batch size to use for all subsequent inferences.
- Parameters
batch_size
: The batch size.
-
virtual
-
class
Output
¶ An output from the model.
Public Functions
-
virtual
~Output
()¶ Destroy the output.
-
virtual const std::string &
Name
() const = 0¶ - Return
- The name of the output.
-
virtual size_t
ByteSize
() const = 0¶ - Return
- The size in bytes of this output. This is the size for one instance of the output, not the entire size of a batched input.
-
virtual const DimsList &
Dims
() const = 0¶ - Return
- The dimensions/shape of the output.
-
virtual
-
class
Request
¶ Handle to a inference request.
The request handle is used to get request results if the request is sent by AsyncRun().
-
class
RequestTimers
¶ Timer to record the timestamp for different stages of request handling.
Public Types
-
enum
Kind
¶ The kind of the timer.
Values:
-
REQUEST_START
¶ The start of request handling.
-
REQUEST_END
¶ The end of request handling.
-
SEND_START
¶ The start of sending request bytes to the server (i.e. first byte).
-
SEND_END
¶ The end of sending request bytes to the server (i.e. last byte).
-
RECEIVE_START
¶ The start of receiving response bytes from the server (i.e.
first byte).
-
RECEIVE_END
¶ The end of receiving response bytes from the server (i.e.
last byte).
-
-
enum
-
class
Result
¶ An inference result corresponding to an output.
Public Types
Public Functions
-
virtual
~Result
()¶ Destroy the result.
-
virtual const std::string &
ModelName
() const = 0¶ - Return
- The name of the model that produced this result.
-
virtual uint32_t
ModelVersion
() const = 0¶ - Return
- The version of the model that produced this result.
-
virtual const std::shared_ptr<Output>
GetOutput
() const = 0¶ - Return
- The Output object corresponding to this result.
-
virtual Error
GetRaw
(size_t batch_idx, const std::vector<uint8_t> **buf) const = 0¶ Get a reference to entire raw result data for a specific batch entry.
Returns error if this result is not RAW format.
- Return
- Error object indicating success or failure.
- Parameters
batch_idx
: Returns the results for this entry of the batch.buf
: Returns the vector of result bytes.
-
virtual Error
GetRawAtCursor
(size_t batch_idx, const uint8_t **buf, size_t adv_byte_size) = 0¶ Get a reference to raw result data for a specific batch entry at the current “cursor” and advance the cursor by the specified number of bytes.
More typically use GetRawAtCursor<T>() method to return the data as a specific type T. Use ResetCursor() to reset the cursor to the beginning of the result. Returns error if this result is not RAW format.
- Return
- Error object indicating success or failure.
- Parameters
batch_idx
: Returns results for this entry of the batch.buf
: Returns pointer to ‘adv_byte_size’ bytes of data.adv_byte_size
: The number of bytes of data to get a reference to.
-
template <typename T>
ErrorGetRawAtCursor
(size_t batch_idx, T *out)¶ Read a value for a specific batch entry at the current “cursor” from the result tensor as the specified type T and advance the cursor.
Use ResetCursor() to reset the cursor to the beginning of the result. Returns error if this result is not RAW format.
- Return
- Error object indicating success or failure.
- Parameters
batch_idx
: Returns results for this entry of the batch.out
: Returns the value at the cursor.
-
virtual Error
GetClassCount
(size_t batch_idx, size_t *cnt) const = 0¶ Get the number of class results for a batch.
Returns error if this result is not CLASS format.
- Return
- Error object indicating success or failure.
- Parameters
batch_idx
: The index in the batch.cnt
: Returns the number of ClassResult entries for the batch entry.
-
virtual Error
GetClassAtCursor
(size_t batch_idx, ClassResult *result) = 0¶ Get the ClassResult result for a specific batch entry at the current cursor.
Use ResetCursor() to reset the cursor to the beginning of the result. Returns error if this result is not CLASS format.
- Return
- Error object indicating success or failure.
- Parameters
batch_idx
: The index in the batch.result
: Returns the ClassResult value for the batch at the cursor.
-
struct
ClassResult
¶ The result value for CLASS format results.
-
virtual
-
struct
Stat
¶ Cumulative statistic of the InferContext.
- Note
- For gRPC protocol, ‘cumulative_send_time_ns’ represents the time for marshaling infer request. ‘cumulative_receive_time_ns’ represents the time for unmarshaling infer response.
Public Members
-
size_t
completed_request_count
¶ Total number of requests completed.
-
uint64_t
cumulative_total_request_time_ns
¶ Time from the request start until the response is completely received.
-
uint64_t
cumulative_send_time_ns
¶ Time from the request start until the last byte is sent.
-
uint64_t
cumulative_receive_time_ns
¶ Time from receiving first byte of the response until the response is completely received.