triton-client#
Rust client library for NVIDIA Triton Inference Server.
This crate provides a type-safe, async Rust API for communicating with Triton Inference Server over gRPC. It wraps the Triton gRPC protocol with ergonomic builder patterns and strong typing while providing zero-cost access to the underlying protobuf types when needed.
Features#
Async/await – Built on
tokioandtonicfor efficient async I/O.Type-safe builders –
InferInputandInferRequestBuildercatch errors at compile time.High performance – Uses raw byte tensor encoding for minimal serialization overhead.
Streaming inference – First-class support for
ModelStreamInfer.Full API coverage – Health checks, metadata, model management, shared memory, tracing, and logging.
Zero-cost escape hatch – Access the raw protobuf types via the
generatedmodule.
Quick Start#
Add triton-client to your Cargo.toml:
[dependencies]
triton-client = { path = "src/rust/triton-client" }
tokio = { version = "1", features = ["full"] }
Basic Example#
use triton_client::prelude::*;
#[tokio::main]
async fn main() -> triton_client::error::Result<()> {
// Connect to Triton
let client = TritonClient::connect("http://localhost:8001").await?;
// Check health
assert!(client.is_server_live().await?);
assert!(client.is_server_ready().await?);
// Query server metadata
let metadata = client.server_metadata().await?;
println!("Server: {} v{}", metadata.name, metadata.version);
// Build an inference request
let input = InferInput::new("input0", vec![1, 16], DataType::Fp32)
.with_data_f32(&[0.0; 16]);
let request = InferRequestBuilder::new("my_model")
.model_version("1")
.input(input)
.output("output0")
.build();
// Run inference
let response = client.infer(request).await?;
let output = response.output_as_f32(0)?;
println!("Output: {:?}", output);
Ok(())
}
Connection Options#
use std::time::Duration;
use triton_client::client::{ClientOptions, TritonClient};
let options = ClientOptions::default()
.connect_timeout(Duration::from_secs(10))
.request_timeout(Duration::from_secs(60))
.max_message_size(256 * 1024 * 1024)
.keep_alive_interval(Duration::from_secs(30))
.keep_alive_timeout(Duration::from_secs(10));
let client = TritonClient::connect_with_options("http://localhost:8001", options).await?;
Model Management#
// List available models
let models = client.repository_index().await?;
for model in &models {
println!("{} v{} [{}]", model.name, model.version, model.state);
}
// Load / unload models
client.load_model("my_model").await?;
client.unload_model("my_model").await?;
Streaming Inference#
use tokio_stream::StreamExt;
let requests = tokio_stream::iter(vec![request1, request2, request3]);
let mut stream = client.infer_stream(requests).await?;
while let Some(result) = stream.next().await {
let response = result?;
println!("Stream response: {}", response.model_name());
}
API Reference#
TritonClient#
Method |
Description |
|---|---|
|
Connect with default options |
|
Connect with custom options |
|
Check if the server process is running |
|
Check if the server is ready for inference |
|
Check if a model is ready |
|
Get server name, version, extensions |
|
Get model inputs/outputs metadata |
|
Get full model configuration |
|
Run a single inference request |
|
Run streaming inference |
|
Get inference statistics |
|
List models in the repository |
|
Load a model |
|
Unload a model |
|
Query system shared memory |
|
Query CUDA shared memory |
|
Get/set trace settings |
|
Get/set log settings |
InferInput#
Builder for input tensors. Supports all Triton data types:
// Numeric data
InferInput::new("input", vec![1, 4], DataType::Fp32).with_data_f32(&[1.0, 2.0, 3.0, 4.0])
InferInput::new("input", vec![1, 4], DataType::Int64).with_data_i64(&[1, 2, 3, 4])
// String / bytes data
InferInput::new("text", vec![1, 2], DataType::Bytes).with_data_bytes(&[b"hello", b"world"])
// Raw data (FP16, BF16, or any format)
InferInput::new("input", vec![1, 2], DataType::Fp16).with_data_raw(raw_bytes)
Testing#
# Unit tests (no server required)
cargo test
# Integration tests (requires running Triton server)
TRITON_TEST_URL=http://localhost:8001 cargo test
Building#
Requires protoc (Protocol Buffers compiler) to be installed:
# macOS
brew install protobuf
# Ubuntu/Debian
apt-get install protobuf-compiler
Then build:
cargo build
License#
BSD-3-Clause. See the license header in each source file for details.