Advanced Usage#

Model Caching#

When the container launches for the first time, it downloads the required models from NGC. To avoid downloading the models on subsequent runs, you can cache them locally by using a cache directory:

# Create the cache directory on the host machine
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
chmod 777 $LOCAL_NIM_CACHE

# Choose manifest profile id based on target architecture.
export MANIFEST_PROFILE_ID=<enter_valid_manifest_profile_id>

# Run the container with the cache directory mounted in the appropriate location
docker run -it --rm --name=maxine-audio2face-2d-nim \
  --runtime=nvidia \
  --gpus all \
  --shm-size=8GB \
  -e NGC_API_KEY=$NGC_API_KEY \
  -e NIM_MANIFEST_PROFILE=$MANIFEST_PROFILE_ID \  
  -e NIM_HTTP_API_PORT=8000 \
  -p 8000:8000 \
  -p 8001:8001 \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  nvcr.io/nim/nvidia/maxine-audio2face-2d:latest

For more information about MANIFEST_PROFILE_ID, refer to Model Manifest Profiles.

SSL enablement#

Audio2Face-2D NIM provides an SSL mode to ensure secure communication between clients and the server by encrypting data in transit. To enable SSL, you need to provide the path to the SSL certificate and key files in the container. The following example shows how to do this:

export NGC_API_KEY=<add-your-api-key>
SSL_CERT=path/to/ssl_key

docker run -it --rm --name=maxine-audio2face-2d-nim \
  --runtime=nvidia \
  --gpus all \
  --shm-size=8GB \
  -v $SSL_CERT:/opt/nim/crt/:ro \
  -e NGC_API_KEY=$NGC_API_KEY \
  -p 8000:8000 \
  -p 8001:8001\
  -e NIM_SSL_MODE="mtls" \
  -e NIM_SSL_CERT_PATH="/opt/nim/crt/ssl_cert_server.pem" \
  -e NIM_SSL_KEY_PATH="/opt/nim/crt/ssl_key_server.pem" \
  nvcr.io/nim/nvidia/maxine-audio2face-2d:latest

NIM_SSL_MODE can be set to “mtls”, “tls”, or “disabled”. If set to “mtls”, the container uses mutual TLS authentication. If set to “tls”, the container uses TLS authentication. For more information, refer to NIM SSL Configuration.

Be sure to verify the permissions of the SSL certificate and key files on the host machine. The container will not be able to access the files if they are not readable by the user running the container.

Multiple Concurrent Inputs#

To run the server in multi-input concurrent mode, set the environment variable MAXINE_MAX_CONCURRENCY_PER_GPU to an integer greater than 1 in the server container. The server will then accept as many concurrent inputs per GPU as specified by the MAXINE_MAX_CONCURRENCY_PER_GPU variable.

Because Triton distributes the workload equally across all GPUs, if there are NUM_GPUS GPUs, the total number of concurrent inputs supported by the server will be NUM_GPUS * MAXINE_MAX_CONCURRENCY_PER_GPU.

NIM Service Configuration Parameters#

Model Selection and Animation Mode#

  • model_selection: Model selection - Performance or Quality

    • Supported values: ModelSelection.MODEL_SELECTION_PERF, ModelSelection.MODEL_SELECTION_QUALITY

    • Default: ModelSelection.MODEL_SELECTION_QUALITY

  • animation_crop_mode: Audio2Face animation cropping mode

    • Supported values:

      • AnimationCroppingMode.ANIMATION_CROPPING_MODE_FACEBOX

      • AnimationCroppingMode.ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

      • AnimationCroppingMode.ANIMATION_CROPPING_MODE_INSET_BLENDING

    • Default: AnimationCroppingMode.ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

Gaze and Eye Movement#

  • enable_lookaway: Flag to enable Gaze look away

    • Supported values: 0, 1

    • Default: 0

  • lookaway_max_offset: Maximum integer value of gaze offset when lookaway is enabled

    • Range: [5, 25]

    • Default: 20

  • lookaway_interval_min: Minimum number of frames at which random lookaway occurs

    • Range: [1, 600]

    • Default: 90

  • lookaway_interval_range: Range for picking the number of frames for random lookaway

    • Range: [1, 600]

    • Default: 240

  • blink_frequency: Frequency of eye blinks per minute

    • Range: [0, 120]

    • Default: 6

  • blink_duration: Duration of an eye blink

    • Range: [2, 150]

    • Default: 10

Mouth Expression and Head Pose#

  • mouth_expression_multiplier: Multiplier to exaggerate mouth expression

    • Range: [1.0f, 2.0f]

    • Default:

      • 1.4f (Quality mode)

      • 1.0f (Performance mode)

  • head_pose_mode: Head Pose Animation mode

    • Supported values:

      • HeadPoseMode.HEAD_POSE_MODE_RETAIN_FROM_PORTRAIT_IMAGE

      • HeadPoseMode.HEAD_POSE_MODE_PRE_DEFINED_ANIMATION

      • HeadPoseMode.HEAD_POSE_MODE_USER_DEFINED_ANIMATION

    • Default: HEAD_POSE_MODE_RETAIN_FROM_PORTRAIT_IMAGE

  • head_pose_multiplier: Multiplier to dampen the range of Head Pose Animation

    • Range: [0.0f, 1.0f]

    • Default:

      • 1.0f (Quality mode)

      • 0.4f (Performance mode)

User-Defined Head Pose Animation#

  • input_head_rotation: Quaternion defining head pose rotation

    • Format: [qx, qy, qz, qw]

    • Clamped range: ±20° in Euler angles if out of range

    • Default: NA

  • input_head_translation: Vector3f defining head pose translation

    • Format: [tx, ty, sz]

    • Range: [±0.03, ±0.02, 0.97-1.03]

    • Default: NA

Note

Running both MODEL_SELECTION_QUALITY and MODEL_SELECTION_PERF modes simultaneously in a NIM launch requires a high-end GPU. On lower-end GPUs, we recommend relaunching NIM when switching between performance and quality modes as needed. Because the Triton Inference Server loads models into GPU memory, lower-end GPUs might encounter memory limitations, potentially leading to out-of-memory issues.

Setting Parameters for A2F2D NIM#

Python#

feature_params is a Python dictionary that holds the feature parameter name and value pairs. In feature_params, the key portrait_image is required and all the other keys are optional. In the absence of optional keys, default values are used.

    feature_params = {
        "portrait_image": portrait_image_encoded,
        "model_selection": ModelSelection.MODEL_SELECTION_QUALITY,
        "animation_crop_mode": AnimationCroppingMode.ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING,
        "enable_lookaway": 1,
        "lookaway_max_offset": 20,
        "lookaway_interval_min": 240,
        "lookaway_interval_range": 90,
        "blink_frequency": 15,
        "blink_duration": 6,
        "mouth_expression_multiplier": 1.4,
        "head_pose_mode": head_pose_mode,
        "head_pose_multiplier": 1.0,
        "input_head_rotation": rotation_data_stream,
        "input_head_translation": translation_data_stream,
    }
# Note: input_head_rotation and input_head_translation are only required for head_pose_mode = HEAD_POSE_MODE_USER_DEFINED_ANIMATION

NodeJS#

The parameter portrait_image is required and all the other keys are optional. In the absence of optional keys, default values are used.

  
    /*
    * APIs for all of the parameters
    +-------------------------------------------------------------------+
    * | Param                       | API calls                         | 
    * +-----------------------------------------------------------------+
    * | model_selection             | setModelSelection(val)            | 
    * +-----------------------------------------------------------------+
    * | animation_crop_mode         | setAnimationCropMode(val)         | 
    * +-----------------------------------------------------------------+
    * | enable_lookaway             | setEnableLookaway(val)            | 
    * +-----------------------------------------------------------------+
    * | lookaway_max_offset         | setLookawayMaxOffset(val)         | 
    * +-----------------------------------------------------------------+
    * | lookaway_interval_min       | setLookawayIntervalMin(val)       | 
    * +-----------------------------------------------------------------+
    * | lookaway_interval_range     | setLookawayIntervalRange(val)     | 
    * +-----------------------------------------------------------------+
    * | blink_frequency             | setBlinkFrequency(val)            | 
    * +-----------------------------------------------------------------+
    * | blink_duration              | setBlinkDuration(val)             | 
    * +-----------------------------------------------------------------+
    * | mouth_expression_multiplier | setMouthExpressionMultiplier(val) | 
      |                             |                                   | 
    * +-----------------------------------------------------------------+
    * | head_pose_mode              | setHeadPoseMode(val)              | 
    * +-----------------------------------------------------------------+
    * | head_pose_multiplier        | setHeadPoseMultiplier(val)        | 
    * +-----------------------------------------------------------------+
    * | input_head_rotation         | setInputHeadRotation(val)         | 
    * +-----------------------------------------------------------------+
    * | input_head_translation      | setInputHeadTranslation(val)      | 
    * +-----------------------------------------------------------------+
    */
    // Note: input_head_rotation and input_head_translation are only required for head_pose_mode=3