Server#

The server consists of a Triton Inference Server (https://developer.nvidia.com/nvidia-triton-inference-server), the Maxine SDK model repository, and the Maxine SDK backend library.

The Triton-enabled AR and VFX SDKs use the following features in the Triton Inference Server to deliver higher throughput and multistream support.

Dynamic batching combines separate requests into batches for concurrent execution during runtime to provide higher throughput. More details about dynamic batching can be found here:
- triton-inference-server/tutorials
- triton-inference-server/server
Sequence batching provides support for processing multiple input video streams concurrently. More details about sequence batching can be found here:
- triton-inference-server/server
Concurrent execution on a single GPU or multi-GPU systems. Triton server can create multiple instances of the feature on a single GPU or on multiple GPUs, which can aid in parallel processing of the requests. More details about concurrent execution can be found here:
- triton-inference-server/tutorials
- triton-inference-server/server

The Maxine SDK backend library and the Maxine SDK model repository implement the Maxine feature on the Triton server. Maxine SDK model repository contains models and configuration files.

The server is supplied with default configuration files, which can be used as it is. We recommended that users do not edit the tensor names, type, shape, or the sequence batching and ensemble architecture in the configuration files. However, some of the parameters in the configuration files can be modified to optimize performance and to enable or disable certain features, as discussed later. Refer to https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html for more details about the Triton model configuration files.

Configuring AR SDK Server#

Each feature is implemented as a folder in the model repository. The following table lists the location for the configuration file for each feature.

Feature	Configuration File Location
Face Detection	FaceBox
Face Keypoints, 68 Keypoints, mode 0	FaceKeypoints68Mode0
Face Keypoints, 68 Keypoints, mode 1	FaceKeypoints68Mode1
Face Keypoints, 68 Keypoints, mode 0	FaceKeypoints126Mode0
Face Keypoints, 68 Keypoints, mode 1	FaceKeypoints126Mode1
Gaze Redirection, 68 Keypoints	GazeRedirectionKey68
Gaze Redirection, 126 Keypoints	GazeRedirectionKey126
VideoLivePortrait, Perf model, Mode1, without FrameSelection	VideoLPPerfMode1FS0
VideoLivePortrait, Perf model, Mode1, with FrameSelection	VideoLPPerfMode1FS1
VideoLivePortrait, Perf model, Mode2, without FrameSelection	VideoLPPerfMode2FS0
VideoLivePortrait, Perf model, Mode2, with FrameSelection	VideoLPPerfMode2FS1
VideoLivePortrait, Perf model, Mode3, without FrameSelection	VideoLPPerfMode3FS0
VideoLivePortrait, Perf model, Mode3, with FrameSelection	VideoLPPerfMode3FS1
VideoLivePortrait, Qual model, Mode1, without FrameSelection	VideoLPQualMode1FS0
VideoLivePortrait, Qual model, Mode1, with FrameSelection	VideoLPQualMode1FS1
VideoLivePortrait, Qual model, Mode2, without FrameSelection	VideoLPQualMode2FS0
VideoLivePortrait, Qual model, Mode2, with FrameSelection	VideoLPQualMode2FS1
VideoLivePortrait, Qual model, Mode3, without FrameSelection	VideoLPQualMode3FS0
VideoLivePortrait, Qual model, Mode3, with FrameSelection	VideoLPQualMode3FS1
LipSync	LipSync

The following parameters can be modified in the configuration file:

Maximum batch size

The property max_batch_size in the configuration file sets the maximum size of the batch Triton uses with the dynamic batcher. We recommend that this parameter be set to a value equal to the expected number of active video streams.
Dynamic batching parameters

The dynamic batching can be optimized by setting the following properties.
- max_candidate_sequences: The maximum number of possible concurrent video streams.
- max_queue_delay_microseconds: The amount of time, in microseconds, that the dynamic batcher will wait to complete the batch.
- max_sequence_idle_microseconds: The amount of time, in microseconds, an idle input video stream is kept active.
Instance group

The instance group property can be used to create multiple instances of the feature on the Triton, either on the same GPU or on multiple GPUs. Refer to triton-inference-server/server for more details. Note that the kind field should always be set to GPU.

Configuring VFX SDK Server#

The AI Green Screen is implemented as a Triton ensemble (triton-inference-server/server). In the model repository, the folders AigsEnsembleMode0 and AigsEnsembleMode1 have the ensemble for AI Green Screen mode 0 and mode 1. The corresponding models and configuration files for mode 0 and mode 1 are in AigsStatefulModelMode0 and AigsStatefulModelMode1.

The following parameters may be modified in the configuration file in AigsStatefulModelMode0 and AigsStatefulModelMode1 folders:

Dynamic batching parameters

The dynamic batching can be optimized by using the following properties.
- max_candidate_sequences: it sets the maximum number of possible concurrent video streams
- max_queue_delay_microseconds: the amount of time the dynamic batcher will wait to complete the batch in microseconds
- max_sequence_idle_microseconds: it sets the time in microseconds an idle input video stream is kept active
Instance group

The instance group property can be used to create multiple instances of the feature on the Triton, either on the same GPU or on multiple GPUs. Refer to triton-inference-server/server for more details. Note that the kind field should always be set to GPU.