Recommendations#

We recommend that the Triton-enabled SDK is integrated with the service as shown in Figure 1. The user’s service application, the SDK library, and the server reside on the same machine. The video data transfer between the service application and the server is done using CUDA shared memory via the SDK library. Multiple service applications can send requests to the same Triton server, or a single service application can create several threads that send requests to the same Triton server. Even within the same thread, the service application can send a batch of requests to the same Triton server using the Maxine SDK batching APIs.