Audio2Face-3D NIM Container Deployment and Configuration Guide#

This guide provides comprehensive instructions for deploying, configuring, and running the Audio2Face-3D NIM Docker container available through the NGC registry.

Before proceeding, please review the Architecture Overview page to understand the core concepts, services, and requirements for running Audio2Face-3D.

Audio2Face-3D offers extensive configuration capabilities through configuration files and environment variables, which can be customized via a custom entrypoint.

Prerequisites#

To run the microservice, you will need:

Access to the NGC Docker registry
Personal NGC API Key
Active login to the nvcr.io registry
NVIDIA Container Toolkit configured with Docker

For detailed hardware and software requirements, consult the Support Matrix page.

Configuration files#

Audio2Face-3D utilizes three distinct configuration file types, each targeting specific user roles:

Stylization Configuration (Artist-focused): Parameters typically adjusted by artists for creative control
Deployment Configuration (DevOps-focused): Parameters related to deployment and infrastructure
Advanced Configuration (Expert-focused): Specialized parameters for specific use cases

Warning

These deployment-time configuration files differ from runtime configuration files in both case convention (snake_case vs. camelCase) and structure. For reference, see this runtime configuration example: config_james.yml.

1. Stylization Configuration Files#

The system provides three variant-specific configuration files:

Claire
James
Mark

Each variant corresponds to a specific AI Model with predefined default values. The James configuration serves as the default for the Microservice.

Model selection (regression vs. diffusion)#

The stylization configuration supports two inference modes:

Regression: set a2f.inference_type: regression and choose a model under a2f.regression_model.inference_model_id (for example james_v2.3.1, claire_v2.3.1, or mark_v2.3).
Diffusion: set a2f.inference_type: diffusion and configure a2f.diffusion_model:
- inference_model_id: diffusion model id (for example multi_v3.2)
- identity: which identity to use with a multi-identity diffusion model (for example james, claire, or mark)
- constant_noise: when true, uses deterministic noise for diffusion inference (more stable/repeatable results); when false, uses non-deterministic noise (more variation between runs)

Only the configuration block matching a2f.inference_type is used at runtime.

Model and profile selection precedence (A2F-3D NIM)#

In the NIM container, the stylization configuration (stylization_config.yaml) is the source of truth for which model the service will run (regression vs diffusion, model id, identity, and diffusion options like constant_noise).

However, the NIM startup logic may update configuration and/or select a different TRT profile depending on environment variables:

If you set ``NIM_MANIFEST_PROFILE`` / ``NIM_MODEL_PROFILE``: the container treats this as an explicit profile choice. The startup logic will attempt to update the stylization config to match the profile’s character (claire/james/mark ⇒ regression; multi ⇒ diffusion with multi_v3.2_james by default).
If you set ``PERF_A2F_MODEL``: the container may update the stylization config to the selected pre-configured model. This is intended for benchmarking and can conflict with custom stylization configs.
If ``NIM_DISABLE_MODEL_DOWNLOAD=true``: the container will try to use TRT engines in /tmp/a2x (mounted cache), and will locally generate TRT engines if needed. In this mode, when cached engines are present, profile selection explicitly tries to honor the model id in the stylization config (requires a2e.trt and {model_id}.trt).
If ``NIM_DISABLE_MODEL_DOWNLOAD=false``: the container will download engines for the selected profile. Ensure the chosen profile matches the model referenced by your stylization config so that the downloaded TRT engines match what the service will load.

Additional NIM startup behaviors (what the container actually does)#

The NIM entrypoint script performs additional deployment-time wiring that is useful to understand when debugging:

Config locations (overridable via env vars):
- STYLIZATION_CONFIG_PATH (defaults to /apps/configs/stylization_config.yaml)
- DEPLOYMENT_CONFIG_PATH (defaults to /apps/configs/deployment_config.yaml)
- ADVANCED_CONFIG_PATH (defaults to /apps/configs/advanced_config.yaml)
Engine locations:
- Downloaded TRT engines land in /opt/nim/workspace and are copied into /tmp/a2x before the service starts.
- When NIM_DISABLE_MODEL_DOWNLOAD=true, TRT engines are expected in /tmp/a2x (for example via a mounted cache), and missing engines can be generated locally via ./service/generate_trt_models.py.
Pre-start validation:
- Before launching a2f_pipeline.run, the script validates that /tmp/a2x/a2e.trt exists and that /tmp/a2x/{model_id}.trt exists, where model_id is read from the stylization config based on a2f.inference_type.
Other config updates used in benchmarking/deployment:
- PERF_MAX_STREAM may update deployment_config.yaml stream count.
- PERF_ENABLE_SMOOTHING may update advanced_config.yaml temporal smoothing.
- NIM_SSL_MODE / TLS mounts may update TLS fields in deployment_config.yaml.
- NIM_GRPC_PORT_TIMEOUT controls how long the container waits for the gRPC port to come up before declaring startup failure.

Claire Configuration#

claire_stylization_config.yaml

# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true # Enable audio2emotion, ai-generated audio-driven emotion
  live_transition_time: 0.5 # Controls the smoothness of the output transition toward the target value across frames; higher values result in smoother transitions. Each frame updates at a rate of <frame time length> / <live transition time> (capped at 1.0) toward the raw result.
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # regression / diffusion
  inference_type: regression

  regression_model:
    inference_model_id: claire_v2.3.1

  diffusion_model:
    inference_model_id: multi_v3.2
    identity: claire
    # If true, use deterministic noise for diffusion inference (more stable/repeatable results).
    # If false, use non-deterministic noise (more variation between runs).
    constant_noise: true

  # Enable or disable tongue blendshapes output
  enable_tongue_blendshapes: false

  face_params:
    eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
    tongue_strength: 1.3 # Controls the range of motion of the tongue
    tongue_height_offset: 0.0 # Controls the height of the tongue
    tongue_depth_offset: 0.0 # Controls the depth of the tongue

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    # Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
    enable_clamping_bs_weight: true
    weight_multipliers: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params) 
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0
      TongueTipUp: 1.0
      TongueTipDown: 1.0
      TongueTipLeft: 1.0
      TongueTipRight: 1.0
      TongueRollUp: 1.0
      TongueRollDown: 1.0
      TongueRollLeft: 1.0
      TongueRollRight: 1.0
      TongueUp: 1.0
      TongueDown: 1.0
      TongueLeft: 1.0
      TongueRight: 1.0
      TongueIn: 1.0
      TongueStretch: 1.0
      TongueWide: 1.0
      TongueNarrow: 1.0
    weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params)
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0
      TongueTipUp: 0.0
      TongueTipDown: 0.0
      TongueTipLeft: 0.0
      TongueTipRight: 0.0
      TongueRollUp: 0.0
      TongueRollDown: 0.0
      TongueRollLeft: 0.0
      TongueRollRight: 0.0
      TongueUp: 0.0
      TongueDown: 0.0
      TongueLeft: 0.0
      TongueRight: 0.0
      TongueIn: 0.0
      TongueStretch: 0.0
      TongueWide: 0.0
      TongueNarrow: 0.0

    active_poses: # Specifies which blendshapes are active for each pose (for more details, see the documentation for blendshape_params)
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0
      TongueTipUp: 1
      TongueTipDown: 1
      TongueTipLeft: 1
      TongueTipRight: 1
      TongueRollUp: 1
      TongueRollDown: 1
      TongueRollLeft: 1
      TongueRollRight: 1
      TongueUp: 1
      TongueDown: 1
      TongueLeft: 1
      TongueRight: 1
      TongueIn: 1
      TongueStretch: 1
      TongueWide: 1
      TongueNarrow: 1

    cancel_poses: # Specifies which blendshapes are cancelled for each pose (for more details, see the documentation for blendshape_params)
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

    symmetry_poses: # Specifies which blendshapes are symmetrical for each pose (for more details, see the documentation for blendshape_params)
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

James Configuration#

james_stylization_config.yaml

# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true # Enable audio2emotion, ai-generated audio-driven emotion
  live_transition_time: 0.5 # Controls the smoothness of the output transition toward the target value across frames; higher values result in smoother transitions. Each frame updates at a rate of <frame time length> / <live transition time> (capped at 1.0) toward the raw result.
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # regression / diffusion
  inference_type: regression

  regression_model:
    inference_model_id: james_v2.3.1

  diffusion_model:
    inference_model_id: multi_v3.2
    identity: james
    # If true, use deterministic noise for diffusion inference (more stable/repeatable results).
    # If false, use non-deterministic noise (more variation between runs).
    constant_noise: true

  # Enable or disable tongue blendshapes output
  enable_tongue_blendshapes: false

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
    tongue_strength: 1.3
    tongue_height_offset: 0.0
    tongue_depth_offset: 0.0

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    # Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
    enable_clamping_bs_weight: true
    
    weight_multipliers: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params) 
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0
      TongueTipUp: 1.0
      TongueTipDown: 1.0
      TongueTipLeft: 1.0
      TongueTipRight: 1.0
      TongueRollUp: 1.0
      TongueRollDown: 1.0
      TongueRollLeft: 1.0
      TongueRollRight: 1.0
      TongueUp: 1.0
      TongueDown: 1.0
      TongueLeft: 1.0
      TongueRight: 1.0
      TongueIn: 1.0
      TongueStretch: 1.0
      TongueWide: 1.0
      TongueNarrow: 1.0

    weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params) 
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0
      TongueTipUp: 0.0
      TongueTipDown: 0.0
      TongueTipLeft: 0.0
      TongueTipRight: 0.0
      TongueRollUp: 0.0
      TongueRollDown: 0.0
      TongueRollLeft: 0.0
      TongueRollRight: 0.0
      TongueUp: 0.0
      TongueDown: 0.0
      TongueLeft: 0.0
      TongueRight: 0.0
      TongueIn: 0.0
      TongueStretch: 0.0
      TongueWide: 0.0
      TongueNarrow: 0.0

    active_poses: # Specifies which blendshapes are active for each pose (for more details, see the documentation for blendshape_params)
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0
      TongueTipUp: 1
      TongueTipDown: 1
      TongueTipLeft: 1
      TongueTipRight: 1
      TongueRollUp: 1
      TongueRollDown: 1
      TongueRollLeft: 1
      TongueRollRight: 1
      TongueUp: 1
      TongueDown: 1
      TongueLeft: 1
      TongueRight: 1
      TongueIn: 1
      TongueStretch: 1
      TongueWide: 1
      TongueNarrow: 1

    cancel_poses: # Specifies which blendshapes are cancelled for each pose (for more details, see the documentation for blendshape_params)
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

    symmetry_poses: # Specifies which blendshapes are symmetrical for each pose (for more details, see the documentation for blendshape_params)
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

Mark Configuration#

mark_stylization_config.yaml

# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true # Enable audio2emotion, ai-generated audio-driven emotion
  live_transition_time: 0.5 # Controls the smoothness of the output transition toward the target value across frames; higher values result in smoother transitions. Each frame updates at a rate of <frame time length> / <live transition time> (capped at 1.0) toward the raw result.
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # regression / diffusion
  inference_type: regression

  regression_model:
    inference_model_id: mark_v2.3

  diffusion_model:
    inference_model_id: multi_v3.2
    identity: mark
    # If true, use deterministic noise for diffusion inference (more stable/repeatable results).
    # If false, use non-deterministic noise (more variation between runs).
    constant_noise: true

  # Enable or disable tongue blendshapes output
  enable_tongue_blendshapes: false

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.3 # Controls the magnitude of the input audio
    lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.1 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
    tongue_strength: 1.3 # Controls the range of motion of the tongue
    tongue_height_offset: 0.0 # Controls the height of the tongue
    tongue_depth_offset: 0.0 # Controls the depth of the tongue

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    # Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
    enable_clamping_bs_weight: true
    
    weight_multipliers: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params) 
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0
      TongueTipUp: 1.0
      TongueTipDown: 1.0
      TongueTipLeft: 1.0
      TongueTipRight: 1.0
      TongueRollUp: 1.0
      TongueRollDown: 1.0
      TongueRollLeft: 1.0
      TongueRollRight: 1.0
      TongueUp: 1.0
      TongueDown: 1.0
      TongueLeft: 1.0
      TongueRight: 1.0
      TongueIn: 1.0
      TongueStretch: 1.0
      TongueWide: 1.0
      TongueNarrow: 1.0

    weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params) 
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0
      TongueTipUp: 0.0
      TongueTipDown: 0.0
      TongueTipLeft: 0.0
      TongueTipRight: 0.0
      TongueRollUp: 0.0
      TongueRollDown: 0.0
      TongueRollLeft: 0.0
      TongueRollRight: 0.0
      TongueUp: 0.0
      TongueDown: 0.0
      TongueLeft: 0.0
      TongueRight: 0.0
      TongueIn: 0.0
      TongueStretch: 0.0
      TongueWide: 0.0
      TongueNarrow: 0.0

    active_poses: # Specifies which blendshapes are active for each pose (for more details, see the documentation for blendshape_params)
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0
      TongueTipUp: 1
      TongueTipDown: 1
      TongueTipLeft: 1
      TongueTipRight: 1
      TongueRollUp: 1
      TongueRollDown: 1
      TongueRollLeft: 1
      TongueRollRight: 1
      TongueUp: 1
      TongueDown: 1
      TongueLeft: 1
      TongueRight: 1
      TongueIn: 1
      TongueStretch: 1
      TongueWide: 1
      TongueNarrow: 1

    cancel_poses: # Specifies which blendshapes are cancelled for each pose (for more details, see the documentation for blendshape_params)
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

    symmetry_poses: # Specifies which blendshapes are symmetrical for each pose (for more details, see the documentation for blendshape_params)
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

2. Deployment Configuration File#

3. Advanced Configuration File#

advanced_config.yaml

input_sanitization:
  # max size of UUID
  max_len_uuid: 50
  # Maximum samplerate
  max_sample_rate: 144000
  # Minimum samplerate
  min_sample_rate: 16000
  # Maximum amount in second for the processing time
  # After this timeout the connection to A2F will be cut
  max_processing_duration_second: 300
  # Maximum size of 1 audio buffer sent over the grpc stream
  max_audio_buffer_size_second: 10
  # Maximum size of the audio clip to process
  max_audio_clip_size_second: 300
  # Maximum amount of time that A2F Controller will wait when not
  # receiving data from A2F, before cutting the connection
  max_wait_time_idle_ms: 30000
  # Will stop serving a user if their fps a lower than low_fps
  # for more than low_fps_max_duration_second seconds
  # For real time application less than 30 FPS means slower than realtime
  # So if users provide audio to the service at less than 30 FPS then
  # the interactive experience will stutter.
  low_fps: 29
  low_fps_max_duration_second: 7

# WARN: Deprecated
garbage_collector:
  # enable or disable the garbage collector
  # This is only used with bidirectional connection where the service is holding data
  # waiting for the client to pick them up.
  enabled: false
  # how often the garbage collector should run
  interval_run_second: 10
  # If the garbage collector finds streams holding
  # more than N seconds of data, it will delete data
  # until the amount falls below this threshold.
  # Clients are expected to retrieve data promptly so that
  # the service doesn't retain the data excessively.
  max_size_stored_data_second: 60


pipeline_parameters:
  # Queues between pipeline components
  # Can be tweaked:
  # Higher values can lead to higher throughput but leads to higher latencies
  # Lower values leads to lower latencies; and potentially lower overall throughput
  # Leave these values to default in case of doubt
  queue_size_after_a2e: 1
  queue_size_after_a2f: 300
  queue_size_after_streammux: 1
  
  # ===== Blendshape Streaming Control =====
  # Controls how blendshapes are sent to the client
  
  # Burst Mode: Send all frames as fast as possible (~20-30ms total)
  # WARNING: May cause AnimGraph buffer overflow and lip sync issues in Tokkio
  # Values:
  #   false = Rate-limited streaming (RECOMMENDED for production)
  #   true  = Burst mode
  burst_mode: false
  
  # Streaming Frame Rate (only used when burst_mode = false)
  # Controls the rate at which blendshapes are sent to the client
  # Delay per frame = 1000 / blendshape_streaming_fps milliseconds
  # 
  # Common values:
  #   30      = 33ms delay (good for bandwidth-constrained networks)
  #   60      = 16ms delay (standard frame rate)
  #   90      = 11ms delay - DEFAULT (Tokkio compatibility)
  #   120     = 8ms delay (low latency)
  #   240     = 4ms delay (very low latency)
  #   500     = 2ms delay (near-burst performance)
  #   1000+   = 1ms or less delay (effectively burst-like)
  #
  # Recommended settings:
  #   - Tokkio/Production: 90
  #   - Low latency: 120-240
  #   - Bandwidth-constrained: 30-60
  blendshape_streaming_fps: 90

  streammux:
    # Do not change this config; this is internal
    adaptive_batching: 0
    # Minimum FPS for all streams
    # Pipeline will not slow down under this value if:
    # * compute allows it
    # * upload speed of audio allows it
    # Here 40 FPS
    # Numerator for that config:
    overall_min_fps_n: 40
    # Denominator for that config:
    overall_min_fps_d: 1

a2f:  
  # Temporal smoothing for blendshape output
  # Set to false for debugging individual frames without smoothing
  temporal_smoothing: true
  
  # GPU device ID to use for A2F inference
  device_id: 0
  
  # GPU Blendshape Solver
  # When true, blendshape solving runs entirely on GPU, improving performance
  # by avoiding CPU-GPU data transfers during the solve step.
  # When false, blendshape solving runs on CPU.
  # Recommended: true (default) for production deployments
  use_gpu_solver: true

a2e:
  inference_interval: 10
  device_id: 0 # Which gpu id to use 


trt_model_generation:
  a2e:
    #Audio2Emotion engine currently does not support precision other than FP32
    precision: "fp32"
    min_shape: 1
    optimal_shape: 10
    maximum_shape: 10
  a2f:
    precision: "fp16"
    min_shape: 1
    optimal_shape: 10
    maximum_shape: 10

resampling:
  # Size of chunks used during resampling process
  chunk_size: 6400
  # Resampling quality parameter
  # - Range: 1.0 to 10.0
  # - Higher values yield superior audio quality
  # - Lower values prioritize computational performance
  quality: 1.0

Key Advanced Configuration Parameters#

A2F Parameters (a2f section):

temporal_smoothing: When true, applies temporal smoothing to blendshape output for smoother animations. Set to false for debugging individual frames without smoothing.
device_id: GPU device ID to use for A2F inference (default: 0).
use_gpu_solver: Controls where blendshape solving is performed:
- true (default, recommended): Blendshape solving runs entirely on GPU. This improves performance by keeping computation on the GPU and avoiding CPU-GPU data transfers during the solve step.
- false: Blendshape solving runs on CPU.

A2E Parameters (a2e section):

inference_interval: Controls how frequently Audio2Emotion inference runs. Higher values reduce compute cost but decrease temporal fidelity of emotion detection.
device_id: GPU device ID to use for A2E inference (default: 0).

These configuration files represent the system’s default values. To implement custom configurations, launch A2F-3D NIM with a custom endpoint and mount your configuration files within the container as detailed in the following sections.

Configuration Usage#

To override default configurations, mount your custom configuration files in a Docker volume at /mnt/configs. For convenience, set up your environment with these commands:

$ mkdir -p ~/.cache/audio2face-3d-configs
$ export LOCAL_CONFIGS=~/.cache/audio2face-3d-configs

Copy the default configurations to your LOCAL_CONFIGS directory:

$ ls $LOCAL_CONFIGS
advanced_config.yaml
claire_stylization_config.yaml
deployment_config.yaml
james_stylization_config.yaml
mark_stylization_config.yaml

Model Cache Management#

Enable local model caching to optimize subsequent service launches. Configure a cache location with appropriate permissions as shown below. Also note, the NIM_DISABLE_MODEL_DOWNLOAD must be set to true as part of the docker run command in order to use cached models properly. This is explained in detail in the Model Caching section of Getting Started.

$ mkdir -p ~/.cache/audio2face-3d
$ chmod 755 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d

Note

The container runs as UID 1000. On most single-user Linux systems your user is also UID 1000 (verify with id -u), so chmod 755 is sufficient. If your UID differs, the container cannot write to the cache and engine generation will fail with Saving engine to file failed / Engine generation failed, or model downloads will fail with PermissionError: Permission denied: '/tmp/a2x/a2e.trt'. To fix this, grant ownership with sudo chown 1000:1000 ~/.cache/audio2face-3d, or for quick prototyping use chmod 777. See Permission Denied on Cache Directory (/tmp/a2x) in Troubleshooting for details.

Launching A2F-3D NIM with Custom Entrypoint#

$ docker run -it --rm --name audio2face-3d \
 --gpus all \
 --network=host \
 --entrypoint /bin/bash -w /opt/nvidia/a2f_pipeline \
 -e NIM_SKIP_A2F_START=true \
 -e NIM_DISABLE_MODEL_DOWNLOAD=true \
 -e NGC_API_KEY=$NGC_API_KEY \
 -v "$LOCAL_NIM_CACHE:/tmp/a2x" \
 -v "$LOCAL_CONFIGS:/mnt/configs/" \
 nvcr.io/nim/nvidia/audio2face-3d:2.0

This command creates a Docker container with GPU support (--gpus all) and host network access (--network=host). For granular port control, replace --network=host with specific port mappings using -p.

The command mounts volumes for model caching (-v "$LOCAL_NIM_CACHE:/tmp/a2x") and configuration overrides (-v "$LOCAL_CONFIGS:/mnt/configs/"). It also stops the download of TRT engines (-e NIM_DISABLE_MODEL_DOWNLOAD=true) from NGC. Omit either mount if the corresponding functionality isn’t needed.

Once inside the container shell:

ubuntu@host-name:/opt/nvidia/a2f_pipeline$

Note

The following commands should be executed within the container unless specified otherwise.

TensorRT Engine Generation#

Generate the TensorRT engine for your GPU using the provided Python application:

usage: generate_trt_models.py [-h] [--stylization-config STYLIZATION_CONFIG] [--advanced-config ADVANCED_CONFIG]

Generates TRT models for A2F Service.

options:
  -h, --help            show this help message and exit
  --stylization-config  STYLIZATION_CONFIG
                        file path to the stylization config
  --advanced-config     ADVANCED_CONFIG
                        file path to the advanced config

Generate Audio2Emotion and Audio2Face TRT engines with default configurations:

$ ./service/generate_trt_models.py

Important

The TRT generation utility in this release is intended for the built-in model set shipped with the container. It only supports the following A2F inference_model_id values:

james_v2.3.1
claire_v2.3.1
mark_v2.3
multi_v3.2

Note

TRT engines are GPU-specific and must be regenerated when changing deployment hardware. While generated engines can be backed up, they’re only compatible with identical hardware configurations.

Service Initialization#

Launch the Audio2Face-3D Service:

$ a2f_pipeline.run -h
Usage: a2f_pipeline.run [--help] [--version] [--stylization-config] [--deployment-config] [--advanced-config]

Optional arguments:
  -h, --help                     shows help message and exits
  -v, --version                  prints version information and exits
  --stylization-config           file path to the stylization config
  --deployment-config            file path to the deployment config
  --advanced-config              file path to the advanced config

Start with default configuration:

$ /usr/local/bin/a2f_pipeline.run

Successful initialization produces:

[2024-04-23 12:44:33.066] [  global  ] [info] Running...

Streamlined Configuration Updates#

To switch to the Claire model, execute these commands within the container:

$ ./service/generate_trt_models.py --stylization-config /mnt/configs/claire_stylization_config.yaml \
   --advanced-config /mnt/configs/advanced_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/claire_stylization_config.yaml \
                   --deployment-config /mnt/configs/deployment_config.yaml \
                   --advanced-config /mnt/configs/advanced_config.yaml

Warning

The current generate_trt_models.py utility doesn’t support cache invalidation. To regenerate models after configuration updates, manually remove the corresponding TRT model from /tmp/a2x/.

Flexible Configuration Management#

The configuration system employs an override mechanism, allowing partial configuration updates without specifying all parameters. For the A2F stylization configuration, you can override only the keys you need, including nested keys under a2f.regression_model or a2f.diffusion_model.

Example 1: Using Mark Stylization#

Create short_mark_stylization_config.yaml in $LOCAL_CONFIGS with:

a2f:
  inference_type: regression
  regression_model:
    inference_model_id: mark_v2.3
  enable_tongue_blendshapes: true

Execute within the container:

$ ./service/generate_trt_models.py --stylization-config  /mnt/configs/short_mark_stylization_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/short_mark_stylization_config.yaml

Warning

The current generate_trt_models.py utility doesn’t support cache invalidation. To regenerate models after configuration updates, manually remove the corresponding TRT model from /tmp/a2x/.

This command produces the same result as using the complete Mark configuration file, since the service will use the Mark regression model and default parameters for any unspecified keys.

Warning

Use the appropriate configuration flag for your file type:

--stylization-config # for the <any>_stylization_config.yaml
--deployment-config # for the deployment_config.yaml
--advanced-config # for the advanced_config.yaml

Advanced Stylization#

The above stylization configuration blendshape tuning was simplified for new users.

For advanced users, a section is available below.

Advanced Blendshape tuning

3 more parameters can be set for blendshape tuning:

active_poses: Which Blendshapes should be active. 1 for active; 0 for inactive
cancel_poses: Which Blendshape cancel each other; matching number indicate which one matches which; -1 noop
symmetry_poses: Which Blendshape is symmetric to another one; matching number indicate which one matches which; -1 noop

claire_stylization_config.yaml

# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # regression / diffusion
  inference_type: regression

  regression_model:
    inference_model_id: claire_v2.3.1

  diffusion_model:
    inference_model_id: multi_v3.2
    identity: claire
    # If true, use deterministic noise for diffusion inference (more stable/repeatable results).
    # If false, use non-deterministic noise (more variation between runs).
    constant_noise: true

  # Enable or disable tongue blendshapes output
  enable_tongue_blendshapes: false

  face_params:
    eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
    tongue_strength: 1.3 # Controls the range of motion of the tongue
    tongue_height_offset: 0.0 # Controls the height of the tongue
    tongue_depth_offset: 0.0 # Controls the depth of the tongue

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    # Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
    enable_clamping_bs_weight: true

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0
      TongueTipUp: 1.0
      TongueTipDown: 1.0
      TongueTipLeft: 1.0
      TongueTipRight: 1.0
      TongueRollUp: 1.0
      TongueRollDown: 1.0
      TongueRollLeft: 1.0
      TongueRollRight: 1.0
      TongueUp: 1.0
      TongueDown: 1.0
      TongueLeft: 1.0
      TongueRight: 1.0
      TongueIn: 1.0
      TongueStretch: 1.0
      TongueWide: 1.0
      TongueNarrow: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0
      TongueTipUp: 0.0
      TongueTipDown: 0.0
      TongueTipLeft: 0.0
      TongueTipRight: 0.0
      TongueRollUp: 0.0
      TongueRollDown: 0.0
      TongueRollLeft: 0.0
      TongueRollRight: 0.0
      TongueUp: 0.0
      TongueDown: 0.0
      TongueLeft: 0.0
      TongueRight: 0.0
      TongueIn: 0.0
      TongueStretch: 0.0
      TongueWide: 0.0
      TongueNarrow: 0.0

    active_poses:
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0
      TongueTipUp: 1
      TongueTipDown: 1
      TongueTipLeft: 1
      TongueTipRight: 1
      TongueRollUp: 1
      TongueRollDown: 1
      TongueRollLeft: 1
      TongueRollRight: 1
      TongueUp: 1
      TongueDown: 1
      TongueLeft: 1
      TongueRight: 1
      TongueIn: 1
      TongueStretch: 1
      TongueWide: 1
      TongueNarrow: 1

    cancel_poses:
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

    symmetry_poses:
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

james_stylization_config.yaml

# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # regression / diffusion
  inference_type: regression

  regression_model:
    inference_model_id: james_v2.3.1

  diffusion_model:
    inference_model_id: multi_v3.2
    identity: james
    # If true, use deterministic noise for diffusion inference (more stable/repeatable results).
    # If false, use non-deterministic noise (more variation between runs).
    constant_noise: true

  # Enable or disable tongue blendshapes output
  enable_tongue_blendshapes: false

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
    tongue_strength: 1.3 # Controls the range of motion of the tongue
    tongue_height_offset: 0.0 # Controls the height of the tongue
    tongue_depth_offset: 0.0 # Controls the depth of the tongue

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    # Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
    enable_clamping_bs_weight: true

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0
      TongueTipUp: 1.0
      TongueTipDown: 1.0
      TongueTipLeft: 1.0
      TongueTipRight: 1.0
      TongueRollUp: 1.0
      TongueRollDown: 1.0
      TongueRollLeft: 1.0
      TongueRollRight: 1.0
      TongueUp: 1.0
      TongueDown: 1.0
      TongueLeft: 1.0
      TongueRight: 1.0
      TongueIn: 1.0
      TongueStretch: 1.0
      TongueWide: 1.0
      TongueNarrow: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0
      TongueTipUp: 0.0
      TongueTipDown: 0.0
      TongueTipLeft: 0.0
      TongueTipRight: 0.0
      TongueRollUp: 0.0
      TongueRollDown: 0.0
      TongueRollLeft: 0.0
      TongueRollRight: 0.0
      TongueUp: 0.0
      TongueDown: 0.0
      TongueLeft: 0.0
      TongueRight: 0.0
      TongueIn: 0.0
      TongueStretch: 0.0
      TongueWide: 0.0
      TongueNarrow: 0.0

    active_poses:
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0
      TongueTipUp: 1
      TongueTipDown: 1
      TongueTipLeft: 1
      TongueTipRight: 1
      TongueRollUp: 1
      TongueRollDown: 1
      TongueRollLeft: 1
      TongueRollRight: 1
      TongueUp: 1
      TongueDown: 1
      TongueLeft: 1
      TongueRight: 1
      TongueIn: 1
      TongueStretch: 1
      TongueWide: 1
      TongueNarrow: 1

    cancel_poses:
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

    symmetry_poses:
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

mark_stylization_config.yaml

# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # regression / diffusion
  inference_type: regression

  regression_model:
    inference_model_id: mark_v2.3

  diffusion_model:
    inference_model_id: multi_v3.2
    identity: mark
    # If true, use deterministic noise for diffusion inference (more stable/repeatable results).
    # If false, use non-deterministic noise (more variation between runs).
    constant_noise: true

  # Enable or disable tongue blendshapes output
  enable_tongue_blendshapes: false

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.3 # Controls the magnitude of the input audio
    lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.1 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
    tongue_strength: 1.3 # Controls the range of motion of the tongue
    tongue_height_offset: 0.0 # Controls the height of the tongue
    tongue_depth_offset: 0.0 # Controls the depth of the tongue

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    # Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
    enable_clamping_bs_weight: true

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0
      TongueTipUp: 1.0
      TongueTipDown: 1.0
      TongueTipLeft: 1.0
      TongueTipRight: 1.0
      TongueRollUp: 1.0
      TongueRollDown: 1.0
      TongueRollLeft: 1.0
      TongueRollRight: 1.0
      TongueUp: 1.0
      TongueDown: 1.0
      TongueLeft: 1.0
      TongueRight: 1.0
      TongueIn: 1.0
      TongueStretch: 1.0
      TongueWide: 1.0
      TongueNarrow: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0
      TongueTipUp: 0.0
      TongueTipDown: 0.0
      TongueTipLeft: 0.0
      TongueTipRight: 0.0
      TongueRollUp: 0.0
      TongueRollDown: 0.0
      TongueRollLeft: 0.0
      TongueRollRight: 0.0
      TongueUp: 0.0
      TongueDown: 0.0
      TongueLeft: 0.0
      TongueRight: 0.0
      TongueIn: 0.0
      TongueStretch: 0.0
      TongueWide: 0.0
      TongueNarrow: 0.0

    active_poses:
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0
      TongueTipUp: 1
      TongueTipDown: 1
      TongueTipLeft: 1
      TongueTipRight: 1
      TongueRollUp: 1
      TongueRollDown: 1
      TongueRollLeft: 1
      TongueRollRight: 1
      TongueUp: 1
      TongueDown: 1
      TongueLeft: 1
      TongueRight: 1
      TongueIn: 1
      TongueStretch: 1
      TongueWide: 1
      TongueNarrow: 1

    cancel_poses:
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

    symmetry_poses:
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

Configuration files for Unreal Engine Metahuman#

If you plan to connect A2F-3D with MetaHuman characters then you will need to use configuration files adapted for them. The only changes for these configuration files compared to the default configuration files are the blendshape multipliers and offsets

MetaHuman Stylization Configuration Files

claire_stylization_config.yaml

# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # regression / diffusion
  inference_type: regression

  regression_model:
    inference_model_id: claire_v2.3.1

  diffusion_model:
    inference_model_id: multi_v3.2
    identity: claire
    # If true, use deterministic noise for diffusion inference (more stable/repeatable results).
    # If false, use non-deterministic noise (more variation between runs).
    constant_noise: true

  # Enable or disable tongue blendshapes output
  enable_tongue_blendshapes: false

  face_params:
    eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
    tongue_strength: 1.3 # Controls the range of motion of the tongue
    tongue_height_offset: 0.0 # Controls the height of the tongue
    tongue_depth_offset: 0.0 # Controls the depth of the tongue

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    # Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
    enable_clamping_bs_weight: true

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 0.7
      JawLeft: 0.2
      JawRight: 0.2
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.2
      MouthPucker: 1.2
      MouthLeft: 0.2
      MouthRight: 0.2
      MouthSmileLeft: 0.8
      MouthSmileRight: 0.8
      MouthFrownLeft: 0.4
      MouthFrownRight: 0.4
      MouthDimpleLeft: 0.7
      MouthDimpleRight: 0.7
      MouthStretchLeft: 0.1
      MouthStretchRight: 0.1
      MouthRollLower: 0.9
      MouthRollUpper: 0.5
      MouthShrugLower: 0.9
      MouthShrugUpper: 0.4
      MouthPressLeft: 0.8
      MouthPressRight: 0.8
      MouthLowerDownLeft: 0.8
      MouthLowerDownRight: 0.8
      MouthUpperUpLeft: 0.8
      MouthUpperUpRight: 0.8
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 0.2
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 0.8
      NoseSneerRight: 0.8
      TongueOut: 0.0
      TongueTipUp: 1.0
      TongueTipDown: 1.0
      TongueTipLeft: 1.0
      TongueTipRight: 1.0
      TongueRollUp: 1.0
      TongueRollDown: 1.0
      TongueRollLeft: 1.0
      TongueRollRight: 1.0
      TongueUp: 1.0
      TongueDown: 1.0
      TongueLeft: 1.0
      TongueRight: 1.0
      TongueIn: 1.0
      TongueStretch: 1.0
      TongueWide: 1.0
      TongueNarrow: 1.0

    weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0
      TongueTipUp: 0.0
      TongueTipDown: 0.0
      TongueTipLeft: 0.0
      TongueTipRight: 0.0
      TongueRollUp: 0.0
      TongueRollDown: 0.0
      TongueRollLeft: 0.0
      TongueRollRight: 0.0
      TongueUp: 0.0
      TongueDown: 0.0
      TongueLeft: 0.0
      TongueRight: 0.0
      TongueIn: 0.0
      TongueStretch: 0.0
      TongueWide: 0.0
      TongueNarrow: 0.0

    active_poses: # Define which poses are active and which one are not
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0
      TongueTipUp: 1
      TongueTipDown: 1
      TongueTipLeft: 1
      TongueTipRight: 1
      TongueRollUp: 1
      TongueRollDown: 1
      TongueRollLeft: 1
      TongueRollRight: 1
      TongueUp: 1
      TongueDown: 1
      TongueLeft: 1
      TongueRight: 1
      TongueIn: 1
      TongueStretch: 1
      TongueWide: 1
      TongueNarrow: 1

    cancel_poses: # Define which poses cancel each other
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

    symmetry_poses: # Define which poses are symmetric to each other
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

james_stylization_config.yaml

# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # regression / diffusion
  inference_type: regression

  regression_model:
    inference_model_id: james_v2.3.1

  diffusion_model:
    inference_model_id: multi_v3.2
    identity: james
    # If true, use deterministic noise for diffusion inference (more stable/repeatable results).
    # If false, use non-deterministic noise (more variation between runs).
    constant_noise: true

  # Enable or disable tongue blendshapes output
  enable_tongue_blendshapes: false

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
    tongue_strength: 1.3 # Controls the range of motion of the tongue
    tongue_height_offset: 0.0 # Controls the height of the tongue
    tongue_depth_offset: 0.0 # Controls the depth of the tongue

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    # Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
    enable_clamping_bs_weight: true

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 0.7
      JawLeft: 0.2
      JawRight: 0.2
      JawOpen: 0.8
      MouthClose: 0.3
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 0.2
      MouthRight: 0.2
      MouthSmileLeft: 1.2
      MouthSmileRight: 1.2
      MouthFrownLeft: 0.5
      MouthFrownRight: 0.5
      MouthDimpleLeft: 0.8
      MouthDimpleRight: 0.8
      MouthStretchLeft: 0.05
      MouthStretchRight: 0.05
      MouthRollLower: 0.8
      MouthRollUpper: 0.5
      MouthShrugLower: 1.0
      MouthShrugUpper: 0.4
      MouthPressLeft: 0.8
      MouthPressRight: 0.8
      MouthLowerDownLeft: 0.8
      MouthLowerDownRight: 0.8
      MouthUpperUpLeft: 0.8
      MouthUpperUpRight: 0.8
      BrowDownLeft: 1.2
      BrowDownRight: 1.2
      BrowInnerUp: 1.3
      BrowOuterUpLeft: 0.8
      BrowOuterUpRight: 0.8
      CheekPuff: 0.2
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 0.8
      NoseSneerRight: 0.8
      TongueOut: 0.0
      TongueTipUp: 1.0
      TongueTipDown: 1.0
      TongueTipLeft: 1.0
      TongueTipRight: 1.0
      TongueRollUp: 1.0
      TongueRollDown: 1.0
      TongueRollLeft: 1.0
      TongueRollRight: 1.0
      TongueUp: 1.0
      TongueDown: 1.0
      TongueLeft: 1.0
      TongueRight: 1.0
      TongueIn: 1.0
      TongueStretch: 1.0
      TongueWide: 1.0
      TongueNarrow: 1.0

    weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0
      TongueTipUp: 0.0
      TongueTipDown: 0.0
      TongueTipLeft: 0.0
      TongueTipRight: 0.0
      TongueRollUp: 0.0
      TongueRollDown: 0.0
      TongueRollLeft: 0.0
      TongueRollRight: 0.0
      TongueUp: 0.0
      TongueDown: 0.0
      TongueLeft: 0.0
      TongueRight: 0.0
      TongueIn: 0.0
      TongueStretch: 0.0
      TongueWide: 0.0
      TongueNarrow: 0.0

    active_poses: # Define which poses are active and which one are not
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0
      TongueTipUp: 1
      TongueTipDown: 1
      TongueTipLeft: 1
      TongueTipRight: 1
      TongueRollUp: 1
      TongueRollDown: 1
      TongueRollLeft: 1
      TongueRollRight: 1
      TongueUp: 1
      TongueDown: 1
      TongueLeft: 1
      TongueRight: 1
      TongueIn: 1
      TongueStretch: 1
      TongueWide: 1
      TongueNarrow: 1

    cancel_poses: # Define which poses cancel each other
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

    symmetry_poses: # Define which poses are symmetric to each other
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

mark_stylization_config.yaml

# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # regression / diffusion
  inference_type: regression

  regression_model:
    inference_model_id: mark_v2.3

  diffusion_model:
    inference_model_id: multi_v3.2
    identity: mark
    # If true, use deterministic noise for diffusion inference (more stable/repeatable results).
    # If false, use non-deterministic noise (more variation between runs).
    constant_noise: true

  # Enable or disable tongue blendshapes output
  enable_tongue_blendshapes: false

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.3 # Controls the magnitude of the input audio
    lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.1 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
    tongue_strength: 1.3 # Controls the range of motion of the tongue
    tongue_height_offset: 0.0 # Controls the height of the tongue
    tongue_depth_offset: 0.0 # Controls the depth of the tongue

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    # Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
    enable_clamping_bs_weight: true

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 0.7
      JawLeft: 0.2
      JawRight: 0.2
      JawOpen: 1.0
      MouthClose: 0.2
      MouthFunnel: 1.2
      MouthPucker: 1.2
      MouthLeft: 0.2
      MouthRight: 0.2
      MouthSmileLeft: 0.8
      MouthSmileRight: 0.8
      MouthFrownLeft: 0.5
      MouthFrownRight: 0.5
      MouthDimpleLeft: 0.8
      MouthDimpleRight: 0.8
      MouthStretchLeft: 0.05
      MouthStretchRight: 0.05
      MouthRollLower: 0.8
      MouthRollUpper: 0.5
      MouthShrugLower: 0.9
      MouthShrugUpper: 0.4
      MouthPressLeft: 0.8
      MouthPressRight: 0.8
      MouthLowerDownLeft: 0.8
      MouthLowerDownRight: 0.8
      MouthUpperUpLeft: 0.8
      MouthUpperUpRight: 0.8
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 0.2
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 0.8
      NoseSneerRight: 0.8
      TongueOut: 0.0
      TongueTipUp: 1.0
      TongueTipDown: 1.0
      TongueTipLeft: 1.0
      TongueTipRight: 1.0
      TongueRollUp: 1.0
      TongueRollDown: 1.0
      TongueRollLeft: 1.0
      TongueRollRight: 1.0
      TongueUp: 1.0
      TongueDown: 1.0
      TongueLeft: 1.0
      TongueRight: 1.0
      TongueIn: 1.0
      TongueStretch: 1.0
      TongueWide: 1.0
      TongueNarrow: 1.0

    weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0
      TongueTipUp: 0.0
      TongueTipDown: 0.0
      TongueTipLeft: 0.0
      TongueTipRight: 0.0
      TongueRollUp: 0.0
      TongueRollDown: 0.0
      TongueRollLeft: 0.0
      TongueRollRight: 0.0
      TongueUp: 0.0
      TongueDown: 0.0
      TongueLeft: 0.0
      TongueRight: 0.0
      TongueIn: 0.0
      TongueStretch: 0.0
      TongueWide: 0.0
      TongueNarrow: 0.0

    active_poses: # Define which poses are active and which one are not
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0
      TongueTipUp: 1
      TongueTipDown: 1
      TongueTipLeft: 1
      TongueTipRight: 1
      TongueRollUp: 1
      TongueRollDown: 1
      TongueRollLeft: 1
      TongueRollRight: 1
      TongueUp: 1
      TongueDown: 1
      TongueLeft: 1
      TongueRight: 1
      TongueIn: 1
      TongueStretch: 1
      TongueWide: 1
      TongueNarrow: 1

    cancel_poses: # Define which poses cancel each other
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

    symmetry_poses: # Define which poses are symmetric to each other
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
      TongueTipUp: -1
      TongueTipDown: -1
      TongueTipLeft: -1
      TongueTipRight: -1
      TongueRollUp: -1
      TongueRollDown: -1
      TongueRollLeft: -1
      TongueRollRight: -1
      TongueUp: -1
      TongueDown: -1
      TongueLeft: -1
      TongueRight: -1
      TongueIn: -1
      TongueStretch: -1
      TongueWide: -1
      TongueNarrow: -1

Parameter Tuning Guide#

Audio2Face-3D imports inference parameters from multiple sources: the inference model SDK, configuration files at deployment-time, and runtime input. Generally, parameters at deployment time override those matching in the model files, while runtime parameters override both deployment-time and model default parameters.

For runtime parameters please see AudioStreamHeader and FaceParameters, BlendShapeParameters, EmotionParameters, EmotionPostProcessingParameters for proto definitions.

Blendshape parameters

Currently, the default blendshape parameters included in the model data are tuned for use with Metahuman avatars. For our default avatars (Claire, Mark, Ben), all 52 values of weight_multipliers in the stylization config should be set to 1.0.

Blendshape Clamping

The enable_clamping_bs_weight parameter controls whether blendshape weight values are constrained to the range [0.0, 1.0]. Blendshape clamping is a post-processing step that ensures blendshape weights stay within the standard range expected by most animation systems. The A2F neural network can produce values outside this range, so clamping normalizes them for compatibility with downstream renderers.

The clamping is applied after multipliers and offsets are applied.

Setting	Behavior
`enable_clamping_bs_weight: true`	Values guaranteed 0.0-1.0. Safe for renderers expecting normalized weights. Recommended for production.
`enable_clamping_bs_weight: false`	Values can exceed range (e.g., 1.2, -0.1). Preserves full model output fidelity. Useful for debugging/analysis.

Note

Tongue blendshape parameters are available when tongue output is enabled (enable_tongue_blendshapes: true). In that case, the service outputs 68 blendshape weights (52 face + 16 tongue) and the runtime BlendShapeParameters maps can include the tongue keys in addition to TongueOut.

Environment variables#

The following table describes the environment variables that can be passed to Audio2Face-3D NIM as a -e argument added to a docker run command:

Variable	Required	Values	Notes
NGC_API_KEY	No	Any string representing a valid NGC API Key	Required only if you want to download TRT engines from NGC. You must set this variable to the value of your personal NGC API key.
NIM_LOGGING_JSONL	No	true / false	Enables (true) or disables (false) JSON Lines format logging to stdout.
NIM_MANIFEST_PROFILE	No	Any valid manifest profile string	Choose the manifest profile id from Supported Models for your GPU.
NIM_DISABLE_MODEL_DOWNLOAD	No	true / false	Disables (true) or enables (false) automatic TRT engine downloads from NGC. When set to ‘true’, automatic downloads are prevented and TRT engines will be generated locally instead. If pre-cached models are mounted, local generation will be skipped. Note that TRT generation fails on RTX 50 series for now.
NIM_SKIP_A2F_START	No	true / false	If set to true, the container will not start the A2F-3D service at startup.

Volumes#

The following table describes the paths inside the container into which the local paths can be mounted. For example, you can mount a volume with the following docker flag -v {LOCAL_PATH}:{PATH_IN_CONTAINER}.

Container path	Required	Notes
/tmp/a2x/	Not required, but if this volume is not mounted, the container will have to do a fresh download or generation of the model each time it is brought up	Path for AI models. Must have execute, read and write permissions or 777.
/mnt/configs/	Needed only in the case where you want to override some configuration parameters	Path for files to override configs

Quick Deployment of Audio2Face-3D Microservices#

Instead of deploying the Audio2Face-3D and manually starting the model, you can quickly deploy them together using the docker-compose file following the quick-start instructions provided in the NVIDIA Audio2Face-3D Samples repo.