NVIDIA TensorRT Inference Server
1.8.0 -0000000
Version select:
Current release
master (unstable)
Older releases
Documentation home
User Guide
Quickstart
Prerequisites
Using A Prebuilt Docker Container
Building With Docker
Building With CMake
Run TensorRT Inference Server
Verify Inference Server Is Running Correctly
Getting The Client Examples
Running The Image Classification Example
Installing the Server
Installing Prebuilt Containers
Running the Server
Example Model Repository
Running The Inference Server
Running The Inference Server On A System Without A GPU
Running The Inference Server Without Docker
Checking Inference Server Status
Client Libraries
Getting the Client Libraries
Build Using Dockerfile
Build Using CMake
Ubuntu 16.04 / Ubuntu 18.04
Windows 10
Download From GitHub
Download Docker Image From NGC
Building Your Own Client
Client API
System Shared Memory
CUDA Shared Memory
String Datatype
Client API for Stateful Models
Client Examples
Getting the Client Examples
Build Using Dockerfile
Build Using CMake
Ubuntu 16.04 / Ubuntu 18.04
Windows 10
Download From GitHub
Download Docker Image From NGC
Image Classification Example Application
Ensemble Image Classification Example Application
Performance Measurement Application
Models And Schedulers
Stateless Models
Stateful Models
Ensemble Models
Model Repository
Modifying the Model Repository
Model Versions
Framework Model Definition
TensorRT Models
TensorFlow Models
TensorRT/TensorFlow Models
ONNX Models
PyTorch Models
Caffe2 Models
Custom Backends
Custom Backend API
Example Custom Backend
Ensemble Backends
Model Configuration
Generated Model Configuration
Datatypes
Reshape
Version Policy
Instance Groups
Scheduling And Batching
Default Scheduler
Dynamic Batcher
Sequence Batcher
Ensemble Scheduler
Optimization Policy
TensorRT Optimization
Model Management
Model Control Mode NONE
Model Control Mode POLL
Model Control Mode EXPLICIT
Optimization
Optimization Settings
Dynamic Batcher
Model Instances
Framework-Specific Optimization
ONNX with TensorRT Optimization
TensorFlow with TensorRT Optimization
perf_client
Request Concurrency
Understanding The Output
Visualizing Latency vs. Throughput
Input Data
Communication Protocol
Metrics
Developer Guide
Architecture
Concurrent Model Execution
Custom Operations
TensorRT
TensorFlow
HTTP and GRPC API
Health
Status
Model Control
Inference
Stream Inference
Library API
Building
Building the Server
Building the Server with Docker
Incremental Builds with Docker
Building the Server with CMake
Dependencies
Configure Inference Server
Build Inference Server
Building A Custom Backend
Build Using CMake
Build Using Custom Backend SDK
Using the Custom Instance Wrapper Class
Building the Client Libraries and Examples
Build Using Dockerfile
Build Using CMake
Ubuntu 16.04 / Ubuntu 18.04
Windows 10
Building the Documentation
Testing
Generate QA Model Repositories
Build QA Container
Run QA Container
Contributing
Coding Convention
Reference
FAQ
What are the advantages of running a model with TensorRT Inference Server compared to running directly using the model’s framework API?
Can TensorRT Inference Server run on systems that don’t have GPUs?
Can TensorRT Inference Server be used in non-Docker environments?
How would you use TensorRT Inference Server within the AWS environment?
How do I measure the performance of my model running in the TensorRT Inference Server?
How can I fully utilize the GPU with TensorRT Inference Server?
If I have a server with multiple GPUs should I use one TensorRT Inference Server to manage all GPUs or should I use multiple inference servers, one for each GPU?
Capabilities
Protobuf API
HTTP/GRPC API
Model Configuration
Status
C++ API
Class Hierarchy
File Hierarchy
Full API
Namespaces
Namespace nvidia
Namespace nvidia::inferenceserver
Namespace nvidia::inferenceserver::client
Namespace nvidia::inferenceserver::custom
Classes and Structs
Struct cudaIpcMemHandle_t
Struct custom_initdata_struct
Struct custom_payload_struct
Struct Result::ClassResult
Struct InferContext::Stat
Class Error
Class InferContext
Class InferContext::Input
Class InferContext::Options
Class InferContext::Output
Class InferContext::Request
Class InferContext::Result
Class InferGrpcContext
Class InferGrpcStreamContext
Class InferHttpContext
Class ModelControlContext
Class ModelControlGrpcContext
Class ModelControlHttpContext
Class ServerHealthContext
Class ServerHealthGrpcContext
Class ServerHealthHttpContext
Class ServerStatusContext
Class ServerStatusGrpcContext
Class ServerStatusHttpContext
Class SharedMemoryControlContext
Class SharedMemoryControlGrpcContext
Class SharedMemoryControlHttpContext
Class CustomInstance
Enums
Enum custom_memorytype_enum
Enum custom_serverparamkind_enum
Enum trtserver_errorcode_enum
Enum trtserver_memorytype_enum
Enum trtserver_metricformat_enum
Enum trtserver_modelcontrolmode_enum
Enum trtserver_traceactivity_enum
Enum trtserver_tracelevel_enum
Functions
Function CustomErrorString
Function CustomExecute
Function CustomExecuteV2
Function CustomFinalize
Function CustomInitialize
Function CustomVersion
Function nvidia::inferenceserver::client::operator<<
Function TRTSERVER_ErrorCode
Function TRTSERVER_ErrorCodeString
Function TRTSERVER_ErrorDelete
Function TRTSERVER_ErrorMessage
Function TRTSERVER_ErrorNew
Function TRTSERVER_InferenceRequestProviderDelete
Function TRTSERVER_InferenceRequestProviderInputBatchByteSize
Function TRTSERVER_InferenceRequestProviderNew
Function TRTSERVER_InferenceRequestProviderSetInputData
Function TRTSERVER_InferenceResponseDelete
Function TRTSERVER_InferenceResponseHeader
Function TRTSERVER_InferenceResponseOutputData
Function TRTSERVER_InferenceResponseStatus
Function TRTSERVER_MetricsDelete
Function TRTSERVER_MetricsFormatted
Function TRTSERVER_ProtobufDelete
Function TRTSERVER_ProtobufSerialize
Function TRTSERVER_ResponseAllocatorDelete
Function TRTSERVER_ResponseAllocatorNew
Function TRTSERVER_ServerDelete
Function TRTSERVER_ServerId
Function TRTSERVER_ServerInferAsync
Function TRTSERVER_ServerIsLive
Function TRTSERVER_ServerIsReady
Function TRTSERVER_ServerLoadModel
Function TRTSERVER_ServerMetrics
Function TRTSERVER_ServerModelStatus
Function TRTSERVER_ServerNew
Function TRTSERVER_ServerOptionsAddTensorFlowVgpuMemoryLimits
Function TRTSERVER_ServerOptionsDelete
Function TRTSERVER_ServerOptionsNew
Function TRTSERVER_ServerOptionsSetExitOnError
Function TRTSERVER_ServerOptionsSetExitTimeout
Function TRTSERVER_ServerOptionsSetGpuMetrics
Function TRTSERVER_ServerOptionsSetLogError
Function TRTSERVER_ServerOptionsSetLogInfo
Function TRTSERVER_ServerOptionsSetLogVerbose
Function TRTSERVER_ServerOptionsSetLogWarn
Function TRTSERVER_ServerOptionsSetMetrics
Function TRTSERVER_ServerOptionsSetModelControlMode
Function TRTSERVER_ServerOptionsSetModelRepositoryPath
Function TRTSERVER_ServerOptionsSetPinnedMemoryPoolByteSize
Function TRTSERVER_ServerOptionsSetServerId
Function TRTSERVER_ServerOptionsSetStartupModel
Function TRTSERVER_ServerOptionsSetStrictModelConfig
Function TRTSERVER_ServerOptionsSetStrictReadiness
Function TRTSERVER_ServerOptionsSetTensorFlowGpuMemoryFraction
Function TRTSERVER_ServerOptionsSetTensorFlowSoftPlacement
Function TRTSERVER_ServerPollModelRepository
Function TRTSERVER_ServerRegisterSharedMemory
Function TRTSERVER_ServerSharedMemoryAddress
Function TRTSERVER_ServerSharedMemoryStatus
Function TRTSERVER_ServerStatus
Function TRTSERVER_ServerStop
Function TRTSERVER_ServerUnloadModel
Function TRTSERVER_ServerUnregisterAllSharedMemory
Function TRTSERVER_ServerUnregisterSharedMemory
Function TRTSERVER_SharedMemoryBlockCpuNew
Function TRTSERVER_SharedMemoryBlockDelete
Function TRTSERVER_SharedMemoryBlockGpuNew
Function TRTSERVER_SharedMemoryBlockMemoryType
Function TRTSERVER_SharedMemoryBlockMemoryTypeId
Function TRTSERVER_TraceDelete
Function TRTSERVER_TraceNew
Defines
Define CUSTOM_NO_GPU_DEVICE
Define CUSTOM_SERVER_PARAMETER_CNT
Define DECLSPEC
Define TRTIS_CLIENT_HEADER_FLAT
Define TRTIS_CLIENT_HEADER_FLAT
Define TRTIS_CLIENT_HEADER_FLAT
Define TRTIS_CUSTOM_EXPORT
Define TRTSERVER_EXPORT
Typedefs
Typedef cudaIpcMemHandle_t
Typedef CustomErrorStringFn_t
Typedef CustomExecuteFn_t
Typedef CustomExecuteV2Fn_t
Typedef CustomFinalizeFn_t
Typedef CustomGetNextInputFn_t
Typedef CustomGetNextInputV2Fn_t
Typedef CustomGetOutputFn_t
Typedef CustomGetOutputV2Fn_t
Typedef CustomInitializeData
Typedef CustomInitializeFn_t
Typedef CustomMemoryType
Typedef CustomPayload
Typedef CustomServerParameter
Typedef CustomVersionFn_t
Typedef nvidia::inferenceserver::CorrelationID
Typedef nvidia::inferenceserver::DimsList
Typedef TRTSERVER_Error_Code
Typedef TRTSERVER_InferenceCompleteFn_t
Typedef TRTSERVER_Memory_Type
Typedef TRTSERVER_Metric_Format
Typedef TRTSERVER_Model_Control_Mode
Typedef TRTSERVER_ResponseAllocatorAllocFn_t
Typedef TRTSERVER_ResponseAllocatorReleaseFn_t
Typedef TRTSERVER_Trace_Activity
Typedef TRTSERVER_Trace_Level
Typedef TRTSERVER_TraceActivityFn_t
Python API
Client
NVIDIA TensorRT Inference Server
Docs
»
C++ API
»
Struct cudaIpcMemHandle_t
View page source
Struct cudaIpcMemHandle_t
¶
Defined in
File request.h
Struct Documentation
¶
struct
cudaIpcMemHandle_t
¶