<no title> — NVIDIA Triton Inference Server

Skip to main content

Ctrl+K

Getting Started

Quickstart

User Guide

Deploying your trained model using Triton
Triton Architecture
Model Repository
Repository Agent
Model Configuration
Request Cancellation
Optimization
Ragged Batching
Rate Limiter
Model Analyzer
Model Management
Custom Operations
Decoupled Backends and Models
Triton Response Cache
Metrics
Triton Server Trace
Triton Inference Server Support for Jetson and JetPack
Version 1 to Version 2 Migration
Secure Deployment Considerations

Debugging

Debugging Guide
FAQ

Protocol Guides

HTTP/REST and GRPC Protocol
Inference Protocols and APIs
Binary Tensor Data Extension
Classification Extension
Generate Extension
Logging Extension
Model Configuration Extension
Model Repository Extension
Schedule Policy Extension
Sequence Extension
Shared-Memory Extension
Statistics Extension
Trace Extension
Parameters Extension

Customization Guide

Building Triton
Customize Triton Container
Testing Triton

Examples

Using Triton Inference Server as a shared library for execution on Jetson
Concurrent inference and dynamic batching

Client

Triton Client Libraries and Examples
Triton Java API
Generate Go Client stubs
Example Javascript Client Using Generated GRPC API
Example Java and Scala client Using Generated GRPC API

Performance Analyzer

Triton Performance Analyzer
Perf Analyzer Documentation
Recommended Installation Method
Quick Start
Perf Analyzer CLI
Inference Load Modes
Input Data
Measurement Modes
Benchmarking Triton via HTTP or gRPC endpoint
GenAI-Perf

Python Backend

Python Backend
Using Triton with Inferentia 1
Auto-Complete Example
BLS Example
Example of using BLS with decoupled models
Custom Metrics Example
Decoupled Model Examples
Model Instance Kind Example
JAX Example
Preprocessing Using Python Backend Example

Repository
Open issue

Contents

Getting Started

Quickstart

User Guide

Deploying your trained model using Triton
Triton Architecture
Model Repository
Repository Agent
Model Configuration
Request Cancellation
Optimization
Ragged Batching
Rate Limiter
Model Analyzer
Model Management
Custom Operations
Decoupled Backends and Models
Triton Response Cache
Metrics
Triton Server Trace
Triton Inference Server Support for Jetson and JetPack
Version 1 to Version 2 Migration
Secure Deployment Considerations

Debugging

Debugging Guide
FAQ

Protocol Guides

HTTP/REST and GRPC Protocol
Inference Protocols and APIs
Binary Tensor Data Extension
Classification Extension
Generate Extension
Logging Extension
Model Configuration Extension
Model Repository Extension
Schedule Policy Extension
Sequence Extension
Shared-Memory Extension
Statistics Extension
Trace Extension
Parameters Extension

Customization Guide

Building Triton
Customize Triton Container
Testing Triton

Examples

Using Triton Inference Server as a shared library for execution on Jetson
Concurrent inference and dynamic batching

Client

Triton Client Libraries and Examples
Triton Java API
Generate Go Client stubs
Example Go Client
Example Javascript Client Using Generated GRPC API
Example Java and Scala client Using Generated GRPC API

Performance Analyzer

Triton Performance Analyzer
Features
Quick Start
Documentation
Contributing
Reporting problems, asking questions
Perf Analyzer Documentation
Recommended Installation Method
Alternative Installation Methods
Quick Start
Perf Analyzer CLI
Inference Load Modes
Input Data
Shared Memory
Measurement Modes
Metrics
Reports
Benchmarking Triton via HTTP or gRPC endpoint
Benchmarking Triton directly via C API
Benchmarking TensorFlow Serving
Benchmarking TorchServe
Advantages of using Perf Analyzer over third-party benchmark suites
GenAI-Perf
Installation
Basic Usage
Model Inputs
Metrics
CLI
Known Issues

Python Backend

Python Backend
Business Logic Scripting
Interoperability and GPU Support
Frameworks
Custom Metrics
Examples
Running with Inferentia
Logging
Adding Custom Parameters in the Model Configuration
Reporting problems, asking questions
Using Triton with Inferentia 1
Using Triton with Inferentia 2, or Trn1
Auto-Complete Example
BLS Example
Example of using BLS with decoupled models
Custom Metrics Example
Decoupled Model Examples
Model Instance Kind Example
JAX Example
Preprocessing Using Python Backend Example

next

Quickstart

Contents

By NVIDIA

© Copyright 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Last updated on Mar 25, 2024.