Getting Started
User Guide
- Deploying your trained model using Triton
- Triton Architecture
- Model Repository
- Repository Agent
- Model Configuration
- Request Cancellation
- Optimization
- Ragged Batching
- Rate Limiter
- Model Analyzer
- Model Management
- Custom Operations
- Decoupled Backends and Models
- Triton Response Cache
- Metrics
- Triton Server Trace
- Triton Inference Server Support for Jetson and JetPack
- Version 1 to Version 2 Migration
- Secure Deployment Considerations
Debugging
Protocol Guides
- HTTP/REST and GRPC Protocol
- Inference Protocols and APIs
- Binary Tensor Data Extension
- Classification Extension
- Generate Extension
- Logging Extension
- Model Configuration Extension
- Model Repository Extension
- Schedule Policy Extension
- Sequence Extension
- Shared-Memory Extension
- Statistics Extension
- Trace Extension
- Parameters Extension
Customization Guide
Examples
Client
Performance Analyzer
- Triton Performance Analyzer
- Features
- Quick Start
- Documentation
- Contributing
- Reporting problems, asking questions
- Perf Analyzer Documentation
- Recommended Installation Method
- Alternative Installation Methods
- Quick Start
- Perf Analyzer CLI
- Inference Load Modes
- Input Data
- Shared Memory
- Measurement Modes
- Metrics
- Reports
- Benchmarking Triton via HTTP or gRPC endpoint
- Benchmarking Triton directly via C API
- Benchmarking TensorFlow Serving
- Benchmarking TorchServe
- Advantages of using Perf Analyzer over third-party benchmark suites
- GenAI-Perf
- GenAI-Perf Compare Subcommand
- Profile Embeddings Models with GenAI-Perf
- Generated File Structures
- Profile Multiple LoRA Adapters
- Profile Vision-Language Models with GenAI-Perf
- Profile Ranking Models with GenAI-Perf
- Tutorials
Python Backend
- Python Backend
- Business Logic Scripting
- Interoperability and GPU Support
- Frameworks
- Custom Metrics
- Examples
- Running with Inferentia
- Logging
- Adding Custom Parameters in the Model Configuration
- Development with VSCode
- Reporting problems, asking questions
- Using Triton with Inferentia 1
- Using Triton with Inferentia 2, or Trn1
- Auto-Complete Example
- BLS Example
- Example of using BLS with decoupled models
- Custom Metrics Example
- Decoupled Model Examples
- Model Instance Kind Example
- JAX Example
- Preprocessing Using Python Backend Example
