Server#
The server consists of a Triton Inference Server (https://developer.nvidia.com/nvidia-triton-inference-server), the Maxine SDK model repository, and the Maxine SDK backend library.
The Triton-enabled AR and VFX SDKs use the following features in the Triton Inference Server to deliver higher throughput and multistream support.
Dynamic batching combines separate requests into batches for concurrent execution during runtime to provide higher throughput. More details about dynamic batching can be found here:
Sequence batching provides support for processing multiple input video streams concurrently. More details about sequence batching can be found here:
Concurrent execution on a single GPU or multi-GPU systems. Triton server can create multiple instances of the feature on a single GPU or on multiple GPUs, which can aid in parallel processing of the requests. More details about concurrent execution can be found here:
The Maxine SDK backend library and the Maxine SDK model repository implement the Maxine feature on the Triton server. Maxine SDK model repository contains models and configuration files.
The server is supplied with default configuration files, which can be used as it is. We recommended that users do not edit the tensor names, type, shape, or the sequence batching and ensemble architecture in the configuration files. However, some of the parameters in the configuration files can be modified to optimize performance and to enable or disable certain features, as discussed later. Refer to https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html for more details about the Triton model configuration files.
Configuring AR SDK Server#
Each feature is implemented as a folder in the model repository. The following table lists the location for the configuration file for each feature.
Feature |
Configuration File Location |
|---|---|
Face Detection |
FaceBox |
Face Keypoints, 68 Keypoints, mode 0 |
FaceKeypoints68Mode0 |
Face Keypoints, 68 Keypoints, mode 1 |
FaceKeypoints68Mode1 |
Face Keypoints, 68 Keypoints, mode 0 |
FaceKeypoints126Mode0 |
Face Keypoints, 68 Keypoints, mode 1 |
FaceKeypoints126Mode1 |
Gaze Redirection, 68 Keypoints |
GazeRedirectionKey68 |
Gaze Redirection, 126 Keypoints |
GazeRedirectionKey126 |
VideoLivePortrait, Perf model, Mode1, without FrameSelection |
VideoLPPerfMode1FS0 |
VideoLivePortrait, Perf model, Mode1, with FrameSelection |
VideoLPPerfMode1FS1 |
VideoLivePortrait, Perf model, Mode2, without FrameSelection |
VideoLPPerfMode2FS0 |
VideoLivePortrait, Perf model, Mode2, with FrameSelection |
VideoLPPerfMode2FS1 |
VideoLivePortrait, Perf model, Mode3, without FrameSelection |
VideoLPPerfMode3FS0 |
VideoLivePortrait, Perf model, Mode3, with FrameSelection |
VideoLPPerfMode3FS1 |
VideoLivePortrait, Qual model, Mode1, without FrameSelection |
VideoLPQualMode1FS0 |
VideoLivePortrait, Qual model, Mode1, with FrameSelection |
VideoLPQualMode1FS1 |
VideoLivePortrait, Qual model, Mode2, without FrameSelection |
VideoLPQualMode2FS0 |
VideoLivePortrait, Qual model, Mode2, with FrameSelection |
VideoLPQualMode2FS1 |
VideoLivePortrait, Qual model, Mode3, without FrameSelection |
VideoLPQualMode3FS0 |
VideoLivePortrait, Qual model, Mode3, with FrameSelection |
VideoLPQualMode3FS1 |
LipSync |
LipSync |
The following parameters can be modified in the configuration file:
Maximum batch size
The property max_batch_size in the configuration file sets the maximum size of the batch Triton uses with the dynamic batcher. We recommend that this parameter be set to a value equal to the expected number of active video streams.
Dynamic batching parameters
The dynamic batching can be optimized by setting the following properties.
max_candidate_sequences: The maximum number of possible concurrent video streams.max_queue_delay_microseconds: The amount of time, in microseconds, that the dynamic batcher will wait to complete the batch.max_sequence_idle_microseconds: The amount of time, in microseconds, an idle input video stream is kept active.
Instance group
The instance group property can be used to create multiple instances of the feature on the Triton, either on the same GPU or on multiple GPUs. Refer to triton-inference-server/server for more details. Note that the
kindfield should always be set to GPU.
Configuring VFX SDK Server#
The AI Green Screen is implemented as a Triton ensemble (triton-inference-server/server). In the model repository, the folders AigsEnsembleMode0 and AigsEnsembleMode1 have the ensemble for AI Green Screen mode 0 and mode 1. The corresponding models and configuration files for mode 0 and mode 1 are in AigsStatefulModelMode0 and AigsStatefulModelMode1.
The following parameters may be modified in the configuration file in AigsStatefulModelMode0 and AigsStatefulModelMode1 folders:
Dynamic batching parameters
The dynamic batching can be optimized by using the following properties.
max_candidate_sequences: it sets the maximum number of possible concurrent video streams
max_queue_delay_microseconds: the amount of time the dynamic batcher will wait to complete the batch in microseconds
max_sequence_idle_microseconds: it sets the time in microseconds an idle input video stream is kept active
Instance group
The instance group property can be used to create multiple instances of the feature on the Triton, either on the same GPU or on multiple GPUs. Refer to triton-inference-server/server for more details. Note that the kind field should always be set to GPU.