Triton Java API#

This is a Triton Java API contributed by Alibaba Cloud PAI Team. It’s based on Triton’s HTTP/REST Protocols and for both easy of use and performance.

This Java API mimics Triton’s official Python API. It has similar classes and methods.

  • triton.client.InferInput describes each input to model.

  • triton.client.InferRequestedOutput describes each output from model.

  • triton.client.InferenceServerClient is the main inference class.

Currently the Java API supports only a subset of the entire Triton protocol.

A minimal example would be like:

package triton.client.example;

import java.util.Arrays;
import java.util.List;

import triton.client.InferInput;
import triton.client.InferRequestedOutput;
import triton.client.InferResult;
import triton.client.InferenceServerClient;
import triton.client.pojo.DataType;

public class MinExample {
    public static void main(String[] args) throws Exception {
        boolean isBinary = true;
        InferInput inputIds = new InferInput("input_ids", new long[] {1L, 32}, DataType.INT32);
        int[] inputIdsData = new int[32];
        Arrays.fill(inputIdsData, 1); // fill with some data.
        inputIds.setData(inputIdsData, isBinary);

        InferInput inputMask = new InferInput("input_mask", new long[] {1, 32}, DataType.INT32);
        int[] inputMaskData = new int[32];
        Arrays.fill(inputMaskData, 1);
        inputMask.setData(inputMaskData, isBinary);

        InferInput segmentIds = new InferInput("segment_ids", new long[] {1, 32}, DataType.INT32);
        int[] segmentIdsData = new int[32];
        Arrays.fill(segmentIdsData, 0);
        segmentIds.setData(segmentIdsData, isBinary);
        List<InferInput> inputs = Lists.newArrayList(inputIds, inputMask, segmentIds);
        List<InferRequestedOutput> outputs = Lists.newArrayList(new InferRequestedOutput("logits", isBinary));

        InferenceServerClient client = new InferenceServerClient("", 5000, 5000);
        InferResult result = client.infer("roberta", inputs, outputs);
        float[] logits = result.getOutputAsFloat("logits");

Supported and Unsupported Java client features#

Supported Java client features:#

HTTP client is supported with limited capability. Currently supported:

  • Synchronous inference requests

GRPC has very limited support. Please see grpc generated Java client for details

Unsupported Java client features:#

GRPC client:

  • Full feature Java GRPC client and corresponding tests

HTTP client:

  1. Asynchronous inference requests

  2. Streaming inference requests

  3. SSL or HTTPS protocol communications

  4. Requesting/Receiving Server Metadata Information

  5. Requesting/Receiving Model Metadata Information

  6. Requesting/Receiving Model Inference Statistics

  7. Sending inference requests using Shared Memory (System, GPU)

  8. Sending multiple synchronous inferences on server

  9. Extensions are not supported

Building Java Examples#

The Java examples can be found in examples folder. To compile these examples, simply run:

$ cd client/src/java
$ mvn clean install -Ddir=examples

Then you will be able to find the examples located in your target folder: examples and the compiled jar at target/java-api-0.0.1.jar.