- Deploying your trained model using Triton
- Triton Architecture
- Model Repository
- Repository Agent
- Model Configuration
- Request Cancellation
- Optimization
- Ragged Batching
- Rate Limiter
- Model Analyzer
- Model Management
- Custom Operations
- Decoupled Backends and Models
- Triton Response Cache
- Metrics
- Triton Server Trace
- Triton Inference Server Support for Jetson and JetPack
- Version 1 to Version 2 Migration
- Secure Deployment Considerations
- HTTP/REST and GRPC Protocol
- Inference Protocols and APIs
- Binary Tensor Data Extension
- Classification Extension
- Generate Extension
- Logging Extension
- Model Configuration Extension
- Model Repository Extension
- Schedule Policy Extension
- Sequence Extension
- Shared-Memory Extension
- Statistics Extension
- Trace Extension
- Parameters Extension
- Triton Performance Analyzer
- Features
- Quick Start
- Documentation
- Contributing
- Reporting problems, asking questions
- Perf Analyzer Documentation
- Recommended Installation Method
- Alternative Installation Methods
- Quick Start
- Perf Analyzer CLI
- Inference Load Modes
- Input Data
- Shared Memory
- Measurement Modes
- Metrics
- Reports
- Benchmarking Triton via HTTP or gRPC endpoint
- Benchmarking Triton directly via C API
- Benchmarking TensorFlow Serving
- Benchmarking TorchServe
- Advantages of using Perf Analyzer over third-party benchmark suites
- GenAI-Perf
- Installation
- Basic Usage
- Model Inputs
- Metrics
- CLI
- Known Issues
- Python Backend
- Business Logic Scripting
- Interoperability and GPU Support
- Frameworks
- Custom Metrics
- Examples
- Running with Inferentia
- Logging
- Adding Custom Parameters in the Model Configuration
- Reporting problems, asking questions
- Using Triton with Inferentia 1
- Using Triton with Inferentia 2, or Trn1
- Auto-Complete Example
- BLS Example
- Example of using BLS with decoupled models
- Custom Metrics Example
- Decoupled Model Examples
- Model Instance Kind Example
- JAX Example
- Preprocessing Using Python Backend Example