NVIDIA Triton Inference Server
2.0.0 -0000000
Version select:
  • Documentation home

User Guide

  • Quickstart
    • Prerequisites
    • Using A Prebuilt Docker Container
    • Building With Docker
    • Building With CMake
    • Run Triton Inference Server
    • Verify Triton Is Running Correctly
    • Getting The Client Examples
    • Running The Image Classification Example
  • Installing Triton
  • Running Triton
    • Example Model Repository
    • Running Triton On A System With A GPU
    • Running Triton On A System Without A GPU
    • Running Triton Without Docker
    • Checking Triton Status
  • Client Examples
    • Getting the Client Examples
      • Build Using Dockerfile
      • Build Using CMake
        • Ubuntu 18.04
        • Windows 10
      • Download From GitHub
      • Download Docker Image From NGC
    • Simple Example Applications
      • String Datatype
      • System Shared Memory
      • CUDA Shared Memory
      • Client API for Stateful Models
    • Image Classification Example Application
    • Ensemble Image Classification Example Application
  • Client Libraries
    • Getting the Client Libraries
      • Build Using Dockerfile
      • Build Using CMake
        • Ubuntu 18.04
        • Windows 10
      • Download From GitHub
      • Download Docker Image From NGC
    • Building Your Own Client
    • Client Library API
  • Model Repository
    • Model Repository Locations
      • Local File System
      • Google Cloud Storage
      • S3
    • Repository Layout
    • Modifying the Model Repository
    • Model Versions
    • Framework Model Definition
      • TensorRT Models
      • TensorFlow Models
      • TensorRT/TensorFlow Models
      • ONNX Models
      • PyTorch Models
      • Caffe2 Models
    • Custom Backends
      • Custom Backend API
      • Example Custom Backend
    • Ensemble Backends
  • Model Configuration
    • Minimal Model Configuration
      • Name and Platform
      • Maximum Batch Size
      • Inputs and Outputs
    • Generated Model Configuration
    • Datatypes
    • Reshape
    • Shape Tensors
    • Version Policy
    • Instance Groups
    • Scheduling And Batching
      • Default Scheduler
      • Dynamic Batcher
        • Preferred Batch Sizes
        • Delayed Batching
        • Preserve Ordering
        • Priority Levels
        • Queue Policy
      • Sequence Batcher
      • Ensemble Scheduler
    • Optimization Policy
      • TensorRT Optimization
    • Model Warmup
  • Models And Schedulers
    • Stateless Models
    • Stateful Models
      • Control Inputs
      • Scheduling Strategies
        • Direct
        • Oldest
    • Ensemble Models
  • Model Management
    • Model Control Mode NONE
    • Model Control Mode EXPLICIT
    • Model Control Mode POLL
  • Optimization
    • Optimization Settings
      • Dynamic Batcher
      • Model Instances
      • Framework-Specific Optimization
        • ONNX with TensorRT Optimization
        • TensorFlow with TensorRT Optimization
    • perf_client
      • Request Concurrency
      • Understanding The Output
      • Visualizing Latency vs. Throughput
      • Input Data
      • Real Input Data
      • Shared Memory
      • Communication Protocol
    • Server Trace
      • JSON Trace Output
      • Trace Summary Tool
  • Metrics

Developer Guide

  • Architecture
    • Concurrent Model Execution
  • Custom Operations
    • TensorRT
    • TensorFlow
    • PyTorch
  • HTTP/REST and GRPC API
  • Library API
  • Building
    • Building Triton
      • Building Triton with Docker
        • Incremental Builds with Docker
      • Building Triton with CMake
        • Dependencies
        • Configure Triton Build
        • Build Triton
    • Building A Custom Backend
      • Build Using CMake
      • Build Using Custom Backend SDK
      • Using the Custom Instance Wrapper Class
    • Building the Client Libraries and Examples
    • Building the Documentation
  • Testing
    • Generate QA Model Repositories
    • Build QA Container
    • Run QA Container
  • Contributing
    • Coding Convention

Reference

  • FAQ
    • What are the advantages of running a model with Triton Inference Server compared to running directly using the model’s framework API?
    • Can Triton Inference Server run on systems that don’t have GPUs?
    • Can Triton Inference Server be used in non-Docker environments?
    • Do you provide client libraries for languages other than C++ and Python?
    • How would you use Triton Inference Server within the AWS environment?
    • How do I measure the performance of my model running in the Triton Inference Server?
    • How can I fully utilize the GPU with Triton Inference Server?
    • If I have a server with multiple GPUs should I use one Triton Inference Server to manage all GPUs or should I use multiple inference servers, one for each GPU?
  • Capabilities
  • Protobuf API
    • GRPC API
    • Model Configuration
  • C++ API
    • Class Hierarchy
    • File Hierarchy
    • Full API
      • Namespaces
        • Namespace nvidia
        • Namespace nvidia::inferenceserver
        • Namespace nvidia::inferenceserver::client
        • Namespace nvidia::inferenceserver::custom
      • Classes and Structs
        • Struct cudaIpcMemHandle_t
        • Struct custom_initdata_struct
        • Struct custom_payload_struct
        • Struct InferOptions
        • Struct InferStat
        • Class Error
        • Class InferenceServerClient
        • Class InferenceServerGrpcClient
        • Class InferenceServerHttpClient
        • Class InferInput
        • Class InferRequest
        • Class InferRequestedOutput
        • Class InferResult
        • Class RequestTimers
        • Class CustomInstance
      • Enums
        • Enum custom_memorytype_enum
        • Enum custom_serverparamkind_enum
        • Enum TRITONSERVER_datatype_enum
        • Enum TRITONSERVER_errorcode_enum
        • Enum TRITONSERVER_memorytype_enum
        • Enum tritonserver_metricformat_enum
        • Enum tritonserver_modelcontrolmode_enum
        • Enum tritonserver_modelindexflag_enum
        • Enum tritonserver_requestflag_enum
        • Enum tritonserver_traceactivity_enum
        • Enum tritonserver_tracelevel_enum
      • Functions
        • Function CustomErrorString
        • Function CustomExecute
        • Function CustomExecuteV2
        • Function CustomFinalize
        • Function CustomInitialize
        • Function CustomVersion
        • Function TRITONSERVER_DataTypeByteSize
        • Function TRITONSERVER_DataTypeString
        • Function TRITONSERVER_ErrorCode
        • Function TRITONSERVER_ErrorCodeString
        • Function TRITONSERVER_ErrorDelete
        • Function TRITONSERVER_ErrorMessage
        • Function TRITONSERVER_ErrorNew
        • Function TRITONSERVER_InferenceRequestAddInput
        • Function TRITONSERVER_InferenceRequestAddRequestedOutput
        • Function TRITONSERVER_InferenceRequestAppendInputData
        • Function TRITONSERVER_InferenceRequestCorrelationId
        • Function TRITONSERVER_InferenceRequestDelete
        • Function TRITONSERVER_InferenceRequestFlags
        • Function TRITONSERVER_InferenceRequestId
        • Function TRITONSERVER_InferenceRequestNew
        • Function TRITONSERVER_InferenceRequestPriority
        • Function TRITONSERVER_InferenceRequestRemoveAllInputData
        • Function TRITONSERVER_InferenceRequestRemoveAllInputs
        • Function TRITONSERVER_InferenceRequestRemoveAllRequestedOutputs
        • Function TRITONSERVER_InferenceRequestRemoveInput
        • Function TRITONSERVER_InferenceRequestRemoveRequestedOutput
        • Function TRITONSERVER_InferenceRequestSetCorrelationId
        • Function TRITONSERVER_InferenceRequestSetFlags
        • Function TRITONSERVER_InferenceRequestSetId
        • Function TRITONSERVER_InferenceRequestSetPriority
        • Function TRITONSERVER_InferenceRequestSetReleaseCallback
        • Function TRITONSERVER_InferenceRequestSetResponseCallback
        • Function TRITONSERVER_InferenceRequestSetTimeoutMicroseconds
        • Function TRITONSERVER_InferenceRequestTimeoutMicroseconds
        • Function TRITONSERVER_InferenceResponseDelete
        • Function TRITONSERVER_InferenceResponseError
        • Function TRITONSERVER_InferenceResponseId
        • Function TRITONSERVER_InferenceResponseModel
        • Function TRITONSERVER_InferenceResponseOutput
        • Function TRITONSERVER_InferenceResponseOutputClassificationLabel
        • Function TRITONSERVER_InferenceResponseOutputCount
        • Function TRITONSERVER_InferenceTraceActivityString
        • Function TRITONSERVER_InferenceTraceDelete
        • Function TRITONSERVER_InferenceTraceId
        • Function TRITONSERVER_InferenceTraceLevelString
        • Function TRITONSERVER_InferenceTraceModelName
        • Function TRITONSERVER_InferenceTraceModelVersion
        • Function TRITONSERVER_InferenceTraceNew
        • Function TRITONSERVER_InferenceTraceParentId
        • Function TRITONSERVER_MemoryTypeString
        • Function TRITONSERVER_MessageDelete
        • Function TRITONSERVER_MessageSerializeToJson
        • Function TRITONSERVER_MetricsDelete
        • Function TRITONSERVER_MetricsFormatted
        • Function TRITONSERVER_ResponseAllocatorDelete
        • Function TRITONSERVER_ResponseAllocatorNew
        • Function TRITONSERVER_ServerDelete
        • Function TRITONSERVER_ServerInferAsync
        • Function TRITONSERVER_ServerIsLive
        • Function TRITONSERVER_ServerIsReady
        • Function TRITONSERVER_ServerLoadModel
        • Function TRITONSERVER_ServerMetadata
        • Function TRITONSERVER_ServerMetrics
        • Function TRITONSERVER_ServerModelConfig
        • Function TRITONSERVER_ServerModelIndex
        • Function TRITONSERVER_ServerModelIsReady
        • Function TRITONSERVER_ServerModelMetadata
        • Function TRITONSERVER_ServerModelStatistics
        • Function TRITONSERVER_ServerNew
        • Function TRITONSERVER_ServerOptionsAddTensorFlowVgpuMemoryLimits
        • Function TRITONSERVER_ServerOptionsDelete
        • Function TRITONSERVER_ServerOptionsNew
        • Function TRITONSERVER_ServerOptionsSetCudaMemoryPoolByteSize
        • Function TRITONSERVER_ServerOptionsSetExitOnError
        • Function TRITONSERVER_ServerOptionsSetExitTimeout
        • Function TRITONSERVER_ServerOptionsSetGpuMetrics
        • Function TRITONSERVER_ServerOptionsSetLogError
        • Function TRITONSERVER_ServerOptionsSetLogInfo
        • Function TRITONSERVER_ServerOptionsSetLogVerbose
        • Function TRITONSERVER_ServerOptionsSetLogWarn
        • Function TRITONSERVER_ServerOptionsSetMetrics
        • Function TRITONSERVER_ServerOptionsSetMinSupportedComputeCapability
        • Function TRITONSERVER_ServerOptionsSetModelControlMode
        • Function TRITONSERVER_ServerOptionsSetModelRepositoryPath
        • Function TRITONSERVER_ServerOptionsSetPinnedMemoryPoolByteSize
        • Function TRITONSERVER_ServerOptionsSetServerId
        • Function TRITONSERVER_ServerOptionsSetStartupModel
        • Function TRITONSERVER_ServerOptionsSetStrictModelConfig
        • Function TRITONSERVER_ServerOptionsSetStrictReadiness
        • Function TRITONSERVER_ServerOptionsSetTensorFlowGpuMemoryFraction
        • Function TRITONSERVER_ServerOptionsSetTensorFlowSoftPlacement
        • Function TRITONSERVER_ServerPollModelRepository
        • Function TRITONSERVER_ServerStop
        • Function TRITONSERVER_ServerUnloadModel
        • Function TRITONSERVER_StringToDataType
      • Defines
        • Define CUSTOM_NO_GPU_DEVICE
        • Define CUSTOM_SERVER_PARAMETER_CNT
        • Define TRITONSERVER_EXPORT
        • Define TRTIS_CUSTOM_EXPORT
      • Typedefs
        • Typedef CustomErrorStringFn_t
        • Typedef CustomExecuteFn_t
        • Typedef CustomExecuteV2Fn_t
        • Typedef CustomFinalizeFn_t
        • Typedef CustomGetNextInputFn_t
        • Typedef CustomGetNextInputV2Fn_t
        • Typedef CustomGetOutputFn_t
        • Typedef CustomGetOutputV2Fn_t
        • Typedef CustomInitializeData
        • Typedef CustomInitializeFn_t
        • Typedef CustomMemoryType
        • Typedef CustomPayload
        • Typedef CustomServerParameter
        • Typedef CustomVersionFn_t
        • Typedef nvidia::inferenceserver::client::Headers
        • Typedef nvidia::inferenceserver::client::Parameters
        • Typedef TRITONSERVER_DataType
        • Typedef TRITONSERVER_Error_Code
        • Typedef TRITONSERVER_InferenceRequestReleaseFn_t
        • Typedef TRITONSERVER_InferenceResponseCompleteFn_t
        • Typedef TRITONSERVER_InferenceTraceActivity
        • Typedef TRITONSERVER_InferenceTraceActivityFn_t
        • Typedef TRITONSERVER_InferenceTraceLevel
        • Typedef TRITONSERVER_InferenceTraceReleaseFn_t
        • Typedef TRITONSERVER_MemoryType
        • Typedef TRITONSERVER_MetricFormat
        • Typedef TRITONSERVER_ModelControlMode
        • Typedef TRITONSERVER_ModelIndexFlag
        • Typedef TRITONSERVER_RequestFlag
        • Typedef TRITONSERVER_ResponseAllocatorAllocFn_t
        • Typedef TRITONSERVER_ResponseAllocatorReleaseFn_t
  • Python API
    • GRPC Client
    • HTTP/REST Client
    • Client Utilities
    • Shared Memory Utilities
NVIDIA Triton Inference Server
  • Docs »
  • C++ API »
  • Namespace nvidia::inferenceserver
  • View page source

Namespace nvidia::inferenceserver¶

Contents

  • Namespaces

Namespaces¶

  • Namespace nvidia::inferenceserver::client

  • Namespace nvidia::inferenceserver::custom

Next Previous

© Copyright 2018-2020, NVIDIA Corporation

Built with Sphinx using a theme provided by Read the Docs.