NVIDIA Triton Inference Server
2.1.0 -0000000
Version select:
Current release
master (unstable)
Older releases
Documentation home
User Guide
Quickstart
Prerequisites
Using A Prebuilt Docker Container
Building With Docker
Building With CMake
Run Triton Inference Server
Verify Triton Is Running Correctly
Getting The Client Examples
Running The Image Classification Example
Installing Triton
Running Triton
Example Model Repository
Running Triton On A System With A GPU
Running Triton On A System Without A GPU
Running Triton Without Docker
Checking Triton Status
Client Examples
Getting the Client Examples
Build Using Dockerfile
Build Using CMake
Ubuntu 18.04
Windows 10
Download From GitHub
Download Docker Image From NGC
Simple Example Applications
String Datatype
System Shared Memory
CUDA Shared Memory
Client API for Stateful Models
Image Classification Example Application
Ensemble Image Classification Example Application
Client Libraries
Getting the Client Libraries
Build Using Dockerfile
Build Using CMake
Ubuntu 18.04
Windows 10
Download From GitHub
Download Docker Image From NGC
Building Your Own Client
Client Library API
Model Repository
Model Repository Locations
Local File System
Google Cloud Storage
S3
Repository Layout
Modifying the Model Repository
Model Versions
Framework Model Definition
TensorRT Models
TensorFlow Models
TensorRT/TensorFlow Models
ONNX Models
PyTorch Models
Caffe2 Models
Custom Backends
Custom Backend API
Example Custom Backend
Ensemble Backends
Model Configuration
Minimal Model Configuration
Name and Platform
Maximum Batch Size
Inputs and Outputs
Generated Model Configuration
Datatypes
Reshape
Shape Tensors
Version Policy
Instance Groups
Scheduling And Batching
Default Scheduler
Dynamic Batcher
Preferred Batch Sizes
Delayed Batching
Preserve Ordering
Priority Levels
Queue Policy
Sequence Batcher
Ensemble Scheduler
Optimization Policy
TensorRT Optimization
Model Warmup
Models And Schedulers
Stateless Models
Stateful Models
Control Inputs
Scheduling Strategies
Direct
Oldest
Ensemble Models
Model Management
Model Control Mode NONE
Model Control Mode EXPLICIT
Model Control Mode POLL
Optimization
Optimization Settings
Dynamic Batcher
Model Instances
Framework-Specific Optimization
ONNX with TensorRT Optimization
TensorFlow with TensorRT Optimization
perf_client
Request Concurrency
Understanding The Output
Visualizing Latency vs. Throughput
Input Data
Real Input Data
Shared Memory
Communication Protocol
Server Trace
JSON Trace Output
Trace Summary Tool
Metrics
Developer Guide
Architecture
Concurrent Model Execution
Custom Operations
TensorRT
TensorFlow
PyTorch
ONNX
HTTP/REST and GRPC API
Library API
Building
Building Triton
Building Triton with Docker
Incremental Builds with Docker
Building Triton with CMake
Dependencies
Configure Triton Build
Build Triton
Building A Custom Backend
Build Using CMake
Build Using Custom Backend SDK
Using the Custom Instance Wrapper Class
Building the Client Libraries and Examples
Building the Documentation
Testing
Generate QA Model Repositories
Build QA Container
Run QA Container
Contributing
Coding Convention
Reference
FAQ
What are the advantages of running a model with Triton Inference Server compared to running directly using the model’s framework API?
Can Triton Inference Server run on systems that don’t have GPUs?
Can Triton Inference Server be used in non-Docker environments?
Do you provide client libraries for languages other than C++ and Python?
How would you use Triton Inference Server within the AWS environment?
How do I measure the performance of my model running in the Triton Inference Server?
How can I fully utilize the GPU with Triton Inference Server?
If I have a server with multiple GPUs should I use one Triton Inference Server to manage all GPUs or should I use multiple inference servers, one for each GPU?
Capabilities
Protobuf API
GRPC API
Model Configuration
C++ API
Class Hierarchy
File Hierarchy
Full API
Namespaces
Namespace nvidia
Namespace nvidia::inferenceserver
Namespace nvidia::inferenceserver::client
Namespace nvidia::inferenceserver::custom
Classes and Structs
Struct custom_initdata_struct
Struct custom_payload_struct
Struct InferOptions
Struct InferStat
Class Error
Class InferenceServerClient
Class InferenceServerGrpcClient
Class InferenceServerHttpClient
Class InferInput
Class InferRequest
Class InferRequestedOutput
Class InferResult
Class RequestTimers
Class CustomInstance
Enums
Enum custom_memorytype_enum
Enum custom_serverparamkind_enum
Enum tritonserver_batchflag_enum
Enum TRITONSERVER_datatype_enum
Enum TRITONSERVER_errorcode_enum
Enum TRITONSERVER_loglevel_enum
Enum TRITONSERVER_memorytype_enum
Enum tritonserver_metricformat_enum
Enum tritonserver_modelcontrolmode_enum
Enum tritonserver_modelindexflag_enum
Enum tritonserver_requestflag_enum
Enum tritonserver_requestreleaseflag_enum
Enum tritonserver_responsecompleteflag_enum
Enum tritonserver_traceactivity_enum
Enum tritonserver_tracelevel_enum
Enum tritonserver_txn_property_flag_enum
Functions
Function CustomErrorString
Function CustomExecute
Function CustomExecuteV2
Function CustomFinalize
Function CustomInitialize
Function CustomVersion
Function TRITONSERVER_DataTypeByteSize
Function TRITONSERVER_DataTypeString
Function TRITONSERVER_ErrorCode
Function TRITONSERVER_ErrorCodeString
Function TRITONSERVER_ErrorDelete
Function TRITONSERVER_ErrorMessage
Function TRITONSERVER_ErrorNew
Function TRITONSERVER_InferenceRequestAddInput
Function TRITONSERVER_InferenceRequestAddRequestedOutput
Function TRITONSERVER_InferenceRequestAppendInputData
Function TRITONSERVER_InferenceRequestCorrelationId
Function TRITONSERVER_InferenceRequestDelete
Function TRITONSERVER_InferenceRequestFlags
Function TRITONSERVER_InferenceRequestId
Function TRITONSERVER_InferenceRequestNew
Function TRITONSERVER_InferenceRequestPriority
Function TRITONSERVER_InferenceRequestRemoveAllInputData
Function TRITONSERVER_InferenceRequestRemoveAllInputs
Function TRITONSERVER_InferenceRequestRemoveAllRequestedOutputs
Function TRITONSERVER_InferenceRequestRemoveInput
Function TRITONSERVER_InferenceRequestRemoveRequestedOutput
Function TRITONSERVER_InferenceRequestSetCorrelationId
Function TRITONSERVER_InferenceRequestSetFlags
Function TRITONSERVER_InferenceRequestSetId
Function TRITONSERVER_InferenceRequestSetPriority
Function TRITONSERVER_InferenceRequestSetReleaseCallback
Function TRITONSERVER_InferenceRequestSetResponseCallback
Function TRITONSERVER_InferenceRequestSetTimeoutMicroseconds
Function TRITONSERVER_InferenceRequestTimeoutMicroseconds
Function TRITONSERVER_InferenceResponseDelete
Function TRITONSERVER_InferenceResponseError
Function TRITONSERVER_InferenceResponseId
Function TRITONSERVER_InferenceResponseModel
Function TRITONSERVER_InferenceResponseOutput
Function TRITONSERVER_InferenceResponseOutputClassificationLabel
Function TRITONSERVER_InferenceResponseOutputCount
Function TRITONSERVER_InferenceTraceActivityString
Function TRITONSERVER_InferenceTraceDelete
Function TRITONSERVER_InferenceTraceId
Function TRITONSERVER_InferenceTraceLevelString
Function TRITONSERVER_InferenceTraceModelName
Function TRITONSERVER_InferenceTraceModelVersion
Function TRITONSERVER_InferenceTraceNew
Function TRITONSERVER_InferenceTraceParentId
Function TRITONSERVER_LogIsEnabled
Function TRITONSERVER_LogMessage
Function TRITONSERVER_MemoryTypeString
Function TRITONSERVER_MessageDelete
Function TRITONSERVER_MessageSerializeToJson
Function TRITONSERVER_MetricsDelete
Function TRITONSERVER_MetricsFormatted
Function TRITONSERVER_ResponseAllocatorDelete
Function TRITONSERVER_ResponseAllocatorNew
Function TRITONSERVER_ServerDelete
Function TRITONSERVER_ServerInferAsync
Function TRITONSERVER_ServerIsLive
Function TRITONSERVER_ServerIsReady
Function TRITONSERVER_ServerLoadModel
Function TRITONSERVER_ServerMetadata
Function TRITONSERVER_ServerMetrics
Function TRITONSERVER_ServerModelBatchProperties
Function TRITONSERVER_ServerModelConfig
Function TRITONSERVER_ServerModelIndex
Function TRITONSERVER_ServerModelIsReady
Function TRITONSERVER_ServerModelMetadata
Function TRITONSERVER_ServerModelStatistics
Function TRITONSERVER_ServerModelTransactionProperties
Function TRITONSERVER_ServerNew
Function TRITONSERVER_ServerOptionsAddTensorFlowVgpuMemoryLimits
Function TRITONSERVER_ServerOptionsDelete
Function TRITONSERVER_ServerOptionsNew
Function TRITONSERVER_ServerOptionsSetCudaMemoryPoolByteSize
Function TRITONSERVER_ServerOptionsSetExitOnError
Function TRITONSERVER_ServerOptionsSetExitTimeout
Function TRITONSERVER_ServerOptionsSetGpuMetrics
Function TRITONSERVER_ServerOptionsSetLogError
Function TRITONSERVER_ServerOptionsSetLogInfo
Function TRITONSERVER_ServerOptionsSetLogVerbose
Function TRITONSERVER_ServerOptionsSetLogWarn
Function TRITONSERVER_ServerOptionsSetMetrics
Function TRITONSERVER_ServerOptionsSetMinSupportedComputeCapability
Function TRITONSERVER_ServerOptionsSetModelControlMode
Function TRITONSERVER_ServerOptionsSetModelRepositoryPath
Function TRITONSERVER_ServerOptionsSetPinnedMemoryPoolByteSize
Function TRITONSERVER_ServerOptionsSetServerId
Function TRITONSERVER_ServerOptionsSetStartupModel
Function TRITONSERVER_ServerOptionsSetStrictModelConfig
Function TRITONSERVER_ServerOptionsSetStrictReadiness
Function TRITONSERVER_ServerOptionsSetTensorFlowGpuMemoryFraction
Function TRITONSERVER_ServerOptionsSetTensorFlowSoftPlacement
Function TRITONSERVER_ServerPollModelRepository
Function TRITONSERVER_ServerStop
Function TRITONSERVER_ServerUnloadModel
Function TRITONSERVER_StringToDataType
Defines
Define CUSTOM_NO_GPU_DEVICE
Define CUSTOM_SERVER_PARAMETER_CNT
Define TRITONSERVER_EXPORT
Define TRTIS_CUSTOM_EXPORT
Typedefs
Typedef CustomErrorStringFn_t
Typedef CustomExecuteFn_t
Typedef CustomExecuteV2Fn_t
Typedef CustomFinalizeFn_t
Typedef CustomGetNextInputFn_t
Typedef CustomGetNextInputV2Fn_t
Typedef CustomGetOutputFn_t
Typedef CustomGetOutputV2Fn_t
Typedef CustomInitializeData
Typedef CustomInitializeFn_t
Typedef CustomMemoryType
Typedef CustomPayload
Typedef CustomServerParameter
Typedef CustomVersionFn_t
Typedef nvidia::inferenceserver::client::Headers
Typedef nvidia::inferenceserver::client::Parameters
Typedef TRITONSERVER_DataType
Typedef TRITONSERVER_Error_Code
Typedef TRITONSERVER_InferenceRequestReleaseFn_t
Typedef TRITONSERVER_InferenceResponseCompleteFn_t
Typedef TRITONSERVER_InferenceTraceActivity
Typedef TRITONSERVER_InferenceTraceActivityFn_t
Typedef TRITONSERVER_InferenceTraceLevel
Typedef TRITONSERVER_InferenceTraceReleaseFn_t
Typedef TRITONSERVER_LogLevel
Typedef TRITONSERVER_MemoryType
Typedef TRITONSERVER_MetricFormat
Typedef TRITONSERVER_ModelBatchFlag
Typedef TRITONSERVER_ModelControlMode
Typedef TRITONSERVER_ModelIndexFlag
Typedef TRITONSERVER_ModelTxnPropertyFlag
Typedef TRITONSERVER_RequestFlag
Typedef TRITONSERVER_RequestReleaseFlag
Typedef TRITONSERVER_ResponseAllocatorAllocFn_t
Typedef TRITONSERVER_ResponseAllocatorReleaseFn_t
Typedef TRITONSERVER_ResponseAllocatorStartFn_t
Typedef TRITONSERVER_ResponseCompleteFlag
Python API
GRPC Client
HTTP/REST Client
Client Utilities
Shared Memory Utilities
NVIDIA Triton Inference Server
Docs
»
C++ API
»
Define TRITONSERVER_EXPORT
View page source
Define TRITONSERVER_EXPORT
¶
Defined in
File tritonserver.h
Define Documentation
¶
TRITONSERVER_EXPORT
¶