Audio2Face-3D NIM Container Deployment and Configuration Guide#
This guide provides comprehensive instructions for deploying, configuring, and running the Audio2Face-3D NIM Docker container available through the NGC registry.
Before proceeding, please review the Architecture Overview page to understand the core concepts, services, and requirements for running Audio2Face-3D.
Audio2Face-3D offers extensive configuration capabilities through configuration files and environment variables, which can be customized via a custom entrypoint.
Prerequisites#
To run the microservice, you will need:
Access to the NGC Docker registry
Active login to the nvcr.io registry
NVIDIA Container Toolkit configured with Docker
For detailed hardware and software requirements, consult the Support Matrix page.
Configuration files#
Audio2Face-3D utilizes three distinct configuration file types, each targeting specific user roles:
Stylization Configuration (Artist-focused): Parameters typically adjusted by artists for creative control
Deployment Configuration (DevOps-focused): Parameters related to deployment and infrastructure
Advanced Configuration (Expert-focused): Specialized parameters for specific use cases
Warning
These deployment-time configuration files differ from runtime configuration files in both case convention (snake_case vs. camelCase) and structure. For reference, see this runtime configuration example: config_james.yml.
1. Stylization Configuration Files#
The system provides three variant-specific configuration files:
Claire
James
Mark
Each variant corresponds to a specific AI Model with predefined default values. The James configuration serves as the default for the Microservice.
Model selection (regression vs. diffusion)#
The stylization configuration supports two inference modes:
Regression: set
a2f.inference_type: regressionand choose a model undera2f.regression_model.inference_model_id(for examplejames_v2.3.1,claire_v2.3.1, ormark_v2.3).Diffusion: set
a2f.inference_type: diffusionand configurea2f.diffusion_model:inference_model_id: diffusion model id (for example
multi_v3.2)identity: which identity to use with a multi-identity diffusion model (for example
james,claire, ormark)constant_noise: when
true, uses deterministic noise for diffusion inference (more stable/repeatable results); whenfalse, uses non-deterministic noise (more variation between runs)
Only the configuration block matching a2f.inference_type is used at runtime.
Model and profile selection precedence (A2F-3D NIM)#
In the NIM container, the stylization configuration (stylization_config.yaml) is the source of truth for which
model the service will run (regression vs diffusion, model id, identity, and diffusion options like constant_noise).
However, the NIM startup logic may update configuration and/or select a different TRT profile depending on environment variables:
If you set ``NIM_MANIFEST_PROFILE`` / ``NIM_MODEL_PROFILE``: the container treats this as an explicit profile choice. The startup logic will attempt to update the stylization config to match the profile’s character (claire/james/mark ⇒ regression; multi ⇒ diffusion with
multi_v3.2_jamesby default).If you set ``PERF_A2F_MODEL``: the container may update the stylization config to the selected pre-configured model. This is intended for benchmarking and can conflict with custom stylization configs.
If ``NIM_DISABLE_MODEL_DOWNLOAD=true``: the container will try to use TRT engines in
/tmp/a2x(mounted cache), and will locally generate TRT engines if needed. In this mode, when cached engines are present, profile selection explicitly tries to honor the model id in the stylization config (requiresa2e.trtand{model_id}.trt).If ``NIM_DISABLE_MODEL_DOWNLOAD=false``: the container will download engines for the selected profile. Ensure the chosen profile matches the model referenced by your stylization config so that the downloaded TRT engines match what the service will load.
Additional NIM startup behaviors (what the container actually does)#
The NIM entrypoint script performs additional deployment-time wiring that is useful to understand when debugging:
Config locations (overridable via env vars):
STYLIZATION_CONFIG_PATH(defaults to/apps/configs/stylization_config.yaml)DEPLOYMENT_CONFIG_PATH(defaults to/apps/configs/deployment_config.yaml)ADVANCED_CONFIG_PATH(defaults to/apps/configs/advanced_config.yaml)
Engine locations:
Downloaded TRT engines land in
/opt/nim/workspaceand are copied into/tmp/a2xbefore the service starts.When
NIM_DISABLE_MODEL_DOWNLOAD=true, TRT engines are expected in/tmp/a2x(for example via a mounted cache), and missing engines can be generated locally via./service/generate_trt_models.py.
Pre-start validation:
Before launching
a2f_pipeline.run, the script validates that/tmp/a2x/a2e.trtexists and that/tmp/a2x/{model_id}.trtexists, wheremodel_idis read from the stylization config based ona2f.inference_type.
Other config updates used in benchmarking/deployment:
PERF_MAX_STREAMmay updatedeployment_config.yamlstream count.PERF_ENABLE_SMOOTHINGmay updateadvanced_config.yamltemporal smoothing.NIM_SSL_MODE/ TLS mounts may update TLS fields indeployment_config.yaml.NIM_GRPC_PORT_TIMEOUTcontrols how long the container waits for the gRPC port to come up before declaring startup failure.
Claire Configuration#
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true # Enable audio2emotion, ai-generated audio-driven emotion
live_transition_time: 0.5 # Controls the smoothness of the output transition toward the target value across frames; higher values result in smoother transitions. Each frame updates at a rate of <frame time length> / <live transition time> (capped at 1.0) toward the raw result.
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# regression / diffusion
inference_type: regression
regression_model:
inference_model_id: claire_v2.3.1
diffusion_model:
inference_model_id: multi_v3.2
identity: claire
# If true, use deterministic noise for diffusion inference (more stable/repeatable results).
# If false, use non-deterministic noise (more variation between runs).
constant_noise: true
# Enable or disable tongue blendshapes output
enable_tongue_blendshapes: false
face_params:
eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
tongue_strength: 1.3 # Controls the range of motion of the tongue
tongue_height_offset: 0.0 # Controls the height of the tongue
tongue_depth_offset: 0.0 # Controls the depth of the tongue
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
# Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
enable_clamping_bs_weight: true
weight_multipliers: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
TongueTipUp: 1.0
TongueTipDown: 1.0
TongueTipLeft: 1.0
TongueTipRight: 1.0
TongueRollUp: 1.0
TongueRollDown: 1.0
TongueRollLeft: 1.0
TongueRollRight: 1.0
TongueUp: 1.0
TongueDown: 1.0
TongueLeft: 1.0
TongueRight: 1.0
TongueIn: 1.0
TongueStretch: 1.0
TongueWide: 1.0
TongueNarrow: 1.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
TongueTipUp: 0.0
TongueTipDown: 0.0
TongueTipLeft: 0.0
TongueTipRight: 0.0
TongueRollUp: 0.0
TongueRollDown: 0.0
TongueRollLeft: 0.0
TongueRollRight: 0.0
TongueUp: 0.0
TongueDown: 0.0
TongueLeft: 0.0
TongueRight: 0.0
TongueIn: 0.0
TongueStretch: 0.0
TongueWide: 0.0
TongueNarrow: 0.0
active_poses: # Specifies which blendshapes are active for each pose (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
TongueTipUp: 1
TongueTipDown: 1
TongueTipLeft: 1
TongueTipRight: 1
TongueRollUp: 1
TongueRollDown: 1
TongueRollLeft: 1
TongueRollRight: 1
TongueUp: 1
TongueDown: 1
TongueLeft: 1
TongueRight: 1
TongueIn: 1
TongueStretch: 1
TongueWide: 1
TongueNarrow: 1
cancel_poses: # Specifies which blendshapes are cancelled for each pose (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
symmetry_poses: # Specifies which blendshapes are symmetrical for each pose (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
James Configuration#
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true # Enable audio2emotion, ai-generated audio-driven emotion
live_transition_time: 0.5 # Controls the smoothness of the output transition toward the target value across frames; higher values result in smoother transitions. Each frame updates at a rate of <frame time length> / <live transition time> (capped at 1.0) toward the raw result.
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# regression / diffusion
inference_type: regression
regression_model:
inference_model_id: james_v2.3.1
diffusion_model:
inference_model_id: multi_v3.2
identity: james
# If true, use deterministic noise for diffusion inference (more stable/repeatable results).
# If false, use non-deterministic noise (more variation between runs).
constant_noise: true
# Enable or disable tongue blendshapes output
enable_tongue_blendshapes: false
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
tongue_strength: 1.3
tongue_height_offset: 0.0
tongue_depth_offset: 0.0
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
# Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
enable_clamping_bs_weight: true
weight_multipliers: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
TongueTipUp: 1.0
TongueTipDown: 1.0
TongueTipLeft: 1.0
TongueTipRight: 1.0
TongueRollUp: 1.0
TongueRollDown: 1.0
TongueRollLeft: 1.0
TongueRollRight: 1.0
TongueUp: 1.0
TongueDown: 1.0
TongueLeft: 1.0
TongueRight: 1.0
TongueIn: 1.0
TongueStretch: 1.0
TongueWide: 1.0
TongueNarrow: 1.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
TongueTipUp: 0.0
TongueTipDown: 0.0
TongueTipLeft: 0.0
TongueTipRight: 0.0
TongueRollUp: 0.0
TongueRollDown: 0.0
TongueRollLeft: 0.0
TongueRollRight: 0.0
TongueUp: 0.0
TongueDown: 0.0
TongueLeft: 0.0
TongueRight: 0.0
TongueIn: 0.0
TongueStretch: 0.0
TongueWide: 0.0
TongueNarrow: 0.0
active_poses: # Specifies which blendshapes are active for each pose (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
TongueTipUp: 1
TongueTipDown: 1
TongueTipLeft: 1
TongueTipRight: 1
TongueRollUp: 1
TongueRollDown: 1
TongueRollLeft: 1
TongueRollRight: 1
TongueUp: 1
TongueDown: 1
TongueLeft: 1
TongueRight: 1
TongueIn: 1
TongueStretch: 1
TongueWide: 1
TongueNarrow: 1
cancel_poses: # Specifies which blendshapes are cancelled for each pose (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
symmetry_poses: # Specifies which blendshapes are symmetrical for each pose (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
Mark Configuration#
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true # Enable audio2emotion, ai-generated audio-driven emotion
live_transition_time: 0.5 # Controls the smoothness of the output transition toward the target value across frames; higher values result in smoother transitions. Each frame updates at a rate of <frame time length> / <live transition time> (capped at 1.0) toward the raw result.
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# regression / diffusion
inference_type: regression
regression_model:
inference_model_id: mark_v2.3
diffusion_model:
inference_model_id: multi_v3.2
identity: mark
# If true, use deterministic noise for diffusion inference (more stable/repeatable results).
# If false, use non-deterministic noise (more variation between runs).
constant_noise: true
# Enable or disable tongue blendshapes output
enable_tongue_blendshapes: false
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.3 # Controls the magnitude of the input audio
lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
skin_strength: 1.1 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
tongue_strength: 1.3 # Controls the range of motion of the tongue
tongue_height_offset: 0.0 # Controls the height of the tongue
tongue_depth_offset: 0.0 # Controls the depth of the tongue
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
# Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
enable_clamping_bs_weight: true
weight_multipliers: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
TongueTipUp: 1.0
TongueTipDown: 1.0
TongueTipLeft: 1.0
TongueTipRight: 1.0
TongueRollUp: 1.0
TongueRollDown: 1.0
TongueRollLeft: 1.0
TongueRollRight: 1.0
TongueUp: 1.0
TongueDown: 1.0
TongueLeft: 1.0
TongueRight: 1.0
TongueIn: 1.0
TongueStretch: 1.0
TongueWide: 1.0
TongueNarrow: 1.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
TongueTipUp: 0.0
TongueTipDown: 0.0
TongueTipLeft: 0.0
TongueTipRight: 0.0
TongueRollUp: 0.0
TongueRollDown: 0.0
TongueRollLeft: 0.0
TongueRollRight: 0.0
TongueUp: 0.0
TongueDown: 0.0
TongueLeft: 0.0
TongueRight: 0.0
TongueIn: 0.0
TongueStretch: 0.0
TongueWide: 0.0
TongueNarrow: 0.0
active_poses: # Specifies which blendshapes are active for each pose (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
TongueTipUp: 1
TongueTipDown: 1
TongueTipLeft: 1
TongueTipRight: 1
TongueRollUp: 1
TongueRollDown: 1
TongueRollLeft: 1
TongueRollRight: 1
TongueUp: 1
TongueDown: 1
TongueLeft: 1
TongueRight: 1
TongueIn: 1
TongueStretch: 1
TongueWide: 1
TongueNarrow: 1
cancel_poses: # Specifies which blendshapes are cancelled for each pose (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
symmetry_poses: # Specifies which blendshapes are symmetrical for each pose (for more details, see the documentation for blendshape_params)
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
2. Deployment Configuration File#
deployment_config.yaml
common:
# Number of stream to use simultaneously
# The recommended value depends on the gpu and your latency constraints
# Higher value means: more concurrent users and higher overall throughput
# Lower value means: less concurrent users, higher throughput per stream, lower latencies
stream_number: 10
# Pad each audio file with some 1.5 seconds of silent audio
add_silence_padding_after_audio: false
silence_padding_duration_second: 1.5
logging:
# Level of log wanted, info is recommended
# Can be one of:
# => trace
# => debug
# => info
# => warn
# => error
# => critical
# => disabled
log_level: info
# How often should FPS logs be printed per stream
fps_logging_interval_second: 1
endpoints:
# [Ignored in v2.0] The bidirectional endpoint is now the only supported mode.
# This field is still parsed for backwards compatibility but has no effect.
use_bidirectional: true
bidirectional:
server:
# port to open
url: 0.0.0.0:52000
# secure mode
# Can be one of:
# => disabled
# => tls
# => mtls
secure_mode: "disabled"
# Path to the certificate file
certificate_path: ""
# Path to the key file
key_path: ""
# Path to the CA file
ca_path: ""
# [Ignored in v2.0] Unidirectional endpoints have been removed.
# These fields are still parsed for backwards compatibility but have no effect.
unidirectional:
server:
url: 0.0.0.0:50000
client:
url: 0.0.0.0:51000
# Configs specific to telemetry
telemetry:
# Name of the service
service_name: audio2face
# Whether to enable metrics
metrics_enabled: false
# Whether to enable traces
traces_enabled: false
# Can be prometheus or otlp
metrics_exporter: prometheus
# Export interval in milliseconds
otel_metric_export_interval: 60000
# Export timeout in milliseconds
otel_metric_export_timeout: 30000
otlp_http_metrics_endpoint: http://localhost:4318/v1/metrics
otlp_http_traces_endpoint: http://localhost:4318/v1/traces
prometheus_endpoint: 0.0.0.0:9464
3. Advanced Configuration File#
advanced_config.yaml
input_sanitization:
# max size of UUID
max_len_uuid: 50
# Maximum samplerate
max_sample_rate: 144000
# Minimum samplerate
min_sample_rate: 16000
# Maximum amount in second for the processing time
# After this timeout the connection to A2F will be cut
max_processing_duration_second: 300
# Maximum size of 1 audio buffer sent over the grpc stream
max_audio_buffer_size_second: 10
# Maximum size of the audio clip to process
max_audio_clip_size_second: 300
# Maximum amount of time that A2F Controller will wait when not
# receiving data from A2F, before cutting the connection
max_wait_time_idle_ms: 30000
# Will stop serving a user if their fps a lower than low_fps
# for more than low_fps_max_duration_second seconds
# For real time application less than 30 FPS means slower than realtime
# So if users provide audio to the service at less than 30 FPS then
# the interactive experience will stutter.
low_fps: 29
low_fps_max_duration_second: 7
# WARN: Deprecated
garbage_collector:
# enable or disable the garbage collector
# This is only used with bidirectional connection where the service is holding data
# waiting for the client to pick them up.
enabled: false
# how often the garbage collector should run
interval_run_second: 10
# If the garbage collector finds streams holding
# more than N seconds of data, it will delete data
# until the amount falls below this threshold.
# Clients are expected to retrieve data promptly so that
# the service doesn't retain the data excessively.
max_size_stored_data_second: 60
pipeline_parameters:
# Queues between pipeline components
# Can be tweaked:
# Higher values can lead to higher throughput but leads to higher latencies
# Lower values leads to lower latencies; and potentially lower overall throughput
# Leave these values to default in case of doubt
queue_size_after_a2e: 1
queue_size_after_a2f: 300
queue_size_after_streammux: 1
# ===== Blendshape Streaming Control =====
# Controls how blendshapes are sent to the client
# Burst Mode: Send all frames as fast as possible (~20-30ms total)
# WARNING: May cause AnimGraph buffer overflow and lip sync issues in Tokkio
# Values:
# false = Rate-limited streaming (RECOMMENDED for production)
# true = Burst mode
burst_mode: false
# Streaming Frame Rate (only used when burst_mode = false)
# Controls the rate at which blendshapes are sent to the client
# Delay per frame = 1000 / blendshape_streaming_fps milliseconds
#
# Common values:
# 30 = 33ms delay (good for bandwidth-constrained networks)
# 60 = 16ms delay (standard frame rate)
# 90 = 11ms delay - DEFAULT (Tokkio compatibility)
# 120 = 8ms delay (low latency)
# 240 = 4ms delay (very low latency)
# 500 = 2ms delay (near-burst performance)
# 1000+ = 1ms or less delay (effectively burst-like)
#
# Recommended settings:
# - Tokkio/Production: 90
# - Low latency: 120-240
# - Bandwidth-constrained: 30-60
blendshape_streaming_fps: 90
streammux:
# Do not change this config; this is internal
adaptive_batching: 0
# Minimum FPS for all streams
# Pipeline will not slow down under this value if:
# * compute allows it
# * upload speed of audio allows it
# Here 40 FPS
# Numerator for that config:
overall_min_fps_n: 40
# Denominator for that config:
overall_min_fps_d: 1
a2f:
# Temporal smoothing for blendshape output
# Set to false for debugging individual frames without smoothing
temporal_smoothing: true
# GPU device ID to use for A2F inference
device_id: 0
# GPU Blendshape Solver
# When true, blendshape solving runs entirely on GPU, improving performance
# by avoiding CPU-GPU data transfers during the solve step.
# When false, blendshape solving runs on CPU.
# Recommended: true (default) for production deployments
use_gpu_solver: true
a2e:
inference_interval: 10
device_id: 0 # Which gpu id to use
trt_model_generation:
a2e:
#Audio2Emotion engine currently does not support precision other than FP32
precision: "fp32"
min_shape: 1
optimal_shape: 10
maximum_shape: 10
a2f:
precision: "fp16"
min_shape: 1
optimal_shape: 10
maximum_shape: 10
resampling:
# Size of chunks used during resampling process
chunk_size: 6400
# Resampling quality parameter
# - Range: 1.0 to 10.0
# - Higher values yield superior audio quality
# - Lower values prioritize computational performance
quality: 1.0
Key Advanced Configuration Parameters#
A2F Parameters (a2f section):
temporal_smoothing: Whentrue, applies temporal smoothing to blendshape output for smoother animations. Set tofalsefor debugging individual frames without smoothing.device_id: GPU device ID to use for A2F inference (default:0).use_gpu_solver: Controls where blendshape solving is performed:true(default, recommended): Blendshape solving runs entirely on GPU. This improves performance by keeping computation on the GPU and avoiding CPU-GPU data transfers during the solve step.false: Blendshape solving runs on CPU.
A2E Parameters (a2e section):
inference_interval: Controls how frequently Audio2Emotion inference runs. Higher values reduce compute cost but decrease temporal fidelity of emotion detection.device_id: GPU device ID to use for A2E inference (default:0).
These configuration files represent the system’s default values. To implement custom configurations, launch A2F-3D NIM with a custom endpoint and mount your configuration files within the container as detailed in the following sections.
Configuration Usage#
To override default configurations, mount your custom configuration files in a Docker volume at /mnt/configs.
For convenience, set up your environment with these commands:
$ mkdir -p ~/.cache/audio2face-3d-configs
$ export LOCAL_CONFIGS=~/.cache/audio2face-3d-configs
Copy the default configurations to your LOCAL_CONFIGS directory:
$ ls $LOCAL_CONFIGS
advanced_config.yaml
claire_stylization_config.yaml
deployment_config.yaml
james_stylization_config.yaml
mark_stylization_config.yaml
Model Cache Management#
Enable local model caching to optimize subsequent service launches. Configure a cache location with appropriate permissions
as shown below. Also note, the NIM_DISABLE_MODEL_DOWNLOAD must be set to true as part of the docker run command
in order to use cached models properly. This is explained in detail in the Model Caching section of
Getting Started.
$ mkdir -p ~/.cache/audio2face-3d
$ chmod 755 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d
Note
The container runs as UID 1000. On most single-user Linux systems your user is also
UID 1000 (verify with id -u), so chmod 755 is sufficient. If your UID differs,
grant ownership with sudo chown 1000:1000 ~/.cache/audio2face-3d, or for quick
prototyping use chmod 777.
Launching A2F-3D NIM with Custom Entrypoint#
$ docker run -it --rm --name audio2face-3d \
--gpus all \
--network=host \
--entrypoint /bin/bash -w /opt/nvidia/a2f_pipeline \
-e NIM_SKIP_A2F_START=true \
-e NIM_DISABLE_MODEL_DOWNLOAD=true \
-e NGC_API_KEY=$NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/tmp/a2x" \
-v "$LOCAL_CONFIGS:/mnt/configs/" \
nvcr.io/nim/nvidia/audio2face-3d:2.0
This command creates a Docker container with GPU support (--gpus all) and host network access (--network=host).
For granular port control, replace --network=host with specific port mappings using -p.
The command mounts volumes for model caching (-v "$LOCAL_NIM_CACHE:/tmp/a2x") and
configuration overrides (-v "$LOCAL_CONFIGS:/mnt/configs/"). It also stops the download
of TRT engines (-e NIM_DISABLE_MODEL_DOWNLOAD=true) from NGC. Omit either mount if the corresponding functionality isn’t needed.
Once inside the container shell:
triton-server@host-name:/opt/nvidia/a2f_pipeline$
Launch the NIM server:
$ /opt/nim/start_server.sh &
The ampersand (&) runs the server as a background process. Press Enter if needed to return to the shell prompt.
Note
The following commands should be executed within the container unless specified otherwise.
TensorRT Engine Generation#
Generate the TensorRT engine for your GPU using the provided Python application:
usage: generate_trt_models.py [-h] [--stylization-config STYLIZATION_CONFIG] [--advanced-config ADVANCED_CONFIG]
Generates TRT models for A2F Service.
options:
-h, --help show this help message and exit
--stylization-config STYLIZATION_CONFIG
file path to the stylization config
--advanced-config ADVANCED_CONFIG
file path to the advanced config
Generate Audio2Emotion and Audio2Face TRT engines with default configurations:
$ ./service/generate_trt_models.py
Important
The TRT generation utility in this release is intended for the built-in model set shipped with the container.
It only supports the following A2F inference_model_id values:
james_v2.3.1claire_v2.3.1mark_v2.3multi_v3.2
Note
TRT engines are GPU-specific and must be regenerated when changing deployment hardware. While generated engines can be backed up, they’re only compatible with identical hardware configurations.
Service Initialization#
Launch the Audio2Face-3D Service:
$ a2f_pipeline.run -h
Usage: a2f_pipeline.run [--help] [--version] [--stylization-config] [--deployment-config] [--advanced-config]
Optional arguments:
-h, --help shows help message and exits
-v, --version prints version information and exits
--stylization-config file path to the stylization config
--deployment-config file path to the deployment config
--advanced-config file path to the advanced config
Start with default configuration:
$ /usr/local/bin/a2f_pipeline.run
Successful initialization produces:
[2024-04-23 12:44:33.066] [ global ] [info] Running...
Streamlined Configuration Updates#
To switch to the Claire model, execute these commands within the container:
$ ./service/generate_trt_models.py --stylization-config /mnt/configs/claire_stylization_config.yaml \
--advanced-config /mnt/configs/advanced_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/claire_stylization_config.yaml \
--deployment-config /mnt/configs/deployment_config.yaml \
--advanced-config /mnt/configs/advanced_config.yaml
Warning
The current generate_trt_models.py utility doesn’t support cache invalidation. To regenerate models after configuration updates, manually remove the corresponding TRT model from /tmp/a2x/.
Flexible Configuration Management#
The configuration system employs an override mechanism, allowing partial configuration updates without specifying all parameters.
For the A2F stylization configuration, you can override only the keys you need, including nested keys under
a2f.regression_model or a2f.diffusion_model.
Example 1: Using Mark Stylization#
Create short_mark_stylization_config.yaml in $LOCAL_CONFIGS with:
a2f:
inference_type: regression
regression_model:
inference_model_id: mark_v2.3
enable_tongue_blendshapes: true
Execute within the container:
$ ./service/generate_trt_models.py --stylization-config /mnt/configs/short_mark_stylization_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/short_mark_stylization_config.yaml
Warning
The current generate_trt_models.py utility doesn’t support cache invalidation. To regenerate models after configuration updates, manually remove the corresponding TRT model from /tmp/a2x/.
This command produces the same result as using the complete Mark configuration file, since the service will use the Mark regression model and default parameters for any unspecified keys.
Warning
Use the appropriate configuration flag for your file type:
--stylization-config # for the <any>_stylization_config.yaml
--deployment-config # for the deployment_config.yaml
--advanced-config # for the advanced_config.yaml
Advanced Stylization#
The above stylization configuration blendshape tuning was simplified for new users.
For advanced users, a section is available below.
Advanced Blendshape tuning
3 more parameters can be set for blendshape tuning:
active_poses: Which Blendshapes should be active. 1 for active; 0 for inactive
cancel_poses: Which Blendshape cancel each other; matching number indicate which one matches which; -1 noop
symmetry_poses: Which Blendshape is symmetric to another one; matching number indicate which one matches which; -1 noop
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# regression / diffusion
inference_type: regression
regression_model:
inference_model_id: claire_v2.3.1
diffusion_model:
inference_model_id: multi_v3.2
identity: claire
# If true, use deterministic noise for diffusion inference (more stable/repeatable results).
# If false, use non-deterministic noise (more variation between runs).
constant_noise: true
# Enable or disable tongue blendshapes output
enable_tongue_blendshapes: false
face_params:
eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
tongue_strength: 1.3 # Controls the range of motion of the tongue
tongue_height_offset: 0.0 # Controls the height of the tongue
tongue_depth_offset: 0.0 # Controls the depth of the tongue
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
# Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
enable_clamping_bs_weight: true
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
TongueTipUp: 1.0
TongueTipDown: 1.0
TongueTipLeft: 1.0
TongueTipRight: 1.0
TongueRollUp: 1.0
TongueRollDown: 1.0
TongueRollLeft: 1.0
TongueRollRight: 1.0
TongueUp: 1.0
TongueDown: 1.0
TongueLeft: 1.0
TongueRight: 1.0
TongueIn: 1.0
TongueStretch: 1.0
TongueWide: 1.0
TongueNarrow: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
TongueTipUp: 0.0
TongueTipDown: 0.0
TongueTipLeft: 0.0
TongueTipRight: 0.0
TongueRollUp: 0.0
TongueRollDown: 0.0
TongueRollLeft: 0.0
TongueRollRight: 0.0
TongueUp: 0.0
TongueDown: 0.0
TongueLeft: 0.0
TongueRight: 0.0
TongueIn: 0.0
TongueStretch: 0.0
TongueWide: 0.0
TongueNarrow: 0.0
active_poses:
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
TongueTipUp: 1
TongueTipDown: 1
TongueTipLeft: 1
TongueTipRight: 1
TongueRollUp: 1
TongueRollDown: 1
TongueRollLeft: 1
TongueRollRight: 1
TongueUp: 1
TongueDown: 1
TongueLeft: 1
TongueRight: 1
TongueIn: 1
TongueStretch: 1
TongueWide: 1
TongueNarrow: 1
cancel_poses:
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
symmetry_poses:
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# regression / diffusion
inference_type: regression
regression_model:
inference_model_id: james_v2.3.1
diffusion_model:
inference_model_id: multi_v3.2
identity: james
# If true, use deterministic noise for diffusion inference (more stable/repeatable results).
# If false, use non-deterministic noise (more variation between runs).
constant_noise: true
# Enable or disable tongue blendshapes output
enable_tongue_blendshapes: false
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
tongue_strength: 1.3 # Controls the range of motion of the tongue
tongue_height_offset: 0.0 # Controls the height of the tongue
tongue_depth_offset: 0.0 # Controls the depth of the tongue
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
# Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
enable_clamping_bs_weight: true
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
TongueTipUp: 1.0
TongueTipDown: 1.0
TongueTipLeft: 1.0
TongueTipRight: 1.0
TongueRollUp: 1.0
TongueRollDown: 1.0
TongueRollLeft: 1.0
TongueRollRight: 1.0
TongueUp: 1.0
TongueDown: 1.0
TongueLeft: 1.0
TongueRight: 1.0
TongueIn: 1.0
TongueStretch: 1.0
TongueWide: 1.0
TongueNarrow: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
TongueTipUp: 0.0
TongueTipDown: 0.0
TongueTipLeft: 0.0
TongueTipRight: 0.0
TongueRollUp: 0.0
TongueRollDown: 0.0
TongueRollLeft: 0.0
TongueRollRight: 0.0
TongueUp: 0.0
TongueDown: 0.0
TongueLeft: 0.0
TongueRight: 0.0
TongueIn: 0.0
TongueStretch: 0.0
TongueWide: 0.0
TongueNarrow: 0.0
active_poses:
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
TongueTipUp: 1
TongueTipDown: 1
TongueTipLeft: 1
TongueTipRight: 1
TongueRollUp: 1
TongueRollDown: 1
TongueRollLeft: 1
TongueRollRight: 1
TongueUp: 1
TongueDown: 1
TongueLeft: 1
TongueRight: 1
TongueIn: 1
TongueStretch: 1
TongueWide: 1
TongueNarrow: 1
cancel_poses:
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
symmetry_poses:
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# regression / diffusion
inference_type: regression
regression_model:
inference_model_id: mark_v2.3
diffusion_model:
inference_model_id: multi_v3.2
identity: mark
# If true, use deterministic noise for diffusion inference (more stable/repeatable results).
# If false, use non-deterministic noise (more variation between runs).
constant_noise: true
# Enable or disable tongue blendshapes output
enable_tongue_blendshapes: false
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.3 # Controls the magnitude of the input audio
lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
skin_strength: 1.1 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
tongue_strength: 1.3 # Controls the range of motion of the tongue
tongue_height_offset: 0.0 # Controls the height of the tongue
tongue_depth_offset: 0.0 # Controls the depth of the tongue
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
# Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
enable_clamping_bs_weight: true
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
TongueTipUp: 1.0
TongueTipDown: 1.0
TongueTipLeft: 1.0
TongueTipRight: 1.0
TongueRollUp: 1.0
TongueRollDown: 1.0
TongueRollLeft: 1.0
TongueRollRight: 1.0
TongueUp: 1.0
TongueDown: 1.0
TongueLeft: 1.0
TongueRight: 1.0
TongueIn: 1.0
TongueStretch: 1.0
TongueWide: 1.0
TongueNarrow: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
TongueTipUp: 0.0
TongueTipDown: 0.0
TongueTipLeft: 0.0
TongueTipRight: 0.0
TongueRollUp: 0.0
TongueRollDown: 0.0
TongueRollLeft: 0.0
TongueRollRight: 0.0
TongueUp: 0.0
TongueDown: 0.0
TongueLeft: 0.0
TongueRight: 0.0
TongueIn: 0.0
TongueStretch: 0.0
TongueWide: 0.0
TongueNarrow: 0.0
active_poses:
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
TongueTipUp: 1
TongueTipDown: 1
TongueTipLeft: 1
TongueTipRight: 1
TongueRollUp: 1
TongueRollDown: 1
TongueRollLeft: 1
TongueRollRight: 1
TongueUp: 1
TongueDown: 1
TongueLeft: 1
TongueRight: 1
TongueIn: 1
TongueStretch: 1
TongueWide: 1
TongueNarrow: 1
cancel_poses:
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
symmetry_poses:
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
Configuration files for Unreal Engine Metahuman#
If you plan to connect A2F-3D with MetaHuman characters then you will need to use configuration files adapted for them. The only changes for these configuration files compared to the default configuration files are the blendshape multipliers and offsets
MetaHuman Stylization Configuration Files
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# regression / diffusion
inference_type: regression
regression_model:
inference_model_id: claire_v2.3.1
diffusion_model:
inference_model_id: multi_v3.2
identity: claire
# If true, use deterministic noise for diffusion inference (more stable/repeatable results).
# If false, use non-deterministic noise (more variation between runs).
constant_noise: true
# Enable or disable tongue blendshapes output
enable_tongue_blendshapes: false
face_params:
eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
tongue_strength: 1.3 # Controls the range of motion of the tongue
tongue_height_offset: 0.0 # Controls the height of the tongue
tongue_depth_offset: 0.0 # Controls the depth of the tongue
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
# Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
enable_clamping_bs_weight: true
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 0.7
JawLeft: 0.2
JawRight: 0.2
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.2
MouthPucker: 1.2
MouthLeft: 0.2
MouthRight: 0.2
MouthSmileLeft: 0.8
MouthSmileRight: 0.8
MouthFrownLeft: 0.4
MouthFrownRight: 0.4
MouthDimpleLeft: 0.7
MouthDimpleRight: 0.7
MouthStretchLeft: 0.1
MouthStretchRight: 0.1
MouthRollLower: 0.9
MouthRollUpper: 0.5
MouthShrugLower: 0.9
MouthShrugUpper: 0.4
MouthPressLeft: 0.8
MouthPressRight: 0.8
MouthLowerDownLeft: 0.8
MouthLowerDownRight: 0.8
MouthUpperUpLeft: 0.8
MouthUpperUpRight: 0.8
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 0.2
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 0.8
NoseSneerRight: 0.8
TongueOut: 0.0
TongueTipUp: 1.0
TongueTipDown: 1.0
TongueTipLeft: 1.0
TongueTipRight: 1.0
TongueRollUp: 1.0
TongueRollDown: 1.0
TongueRollLeft: 1.0
TongueRollRight: 1.0
TongueUp: 1.0
TongueDown: 1.0
TongueLeft: 1.0
TongueRight: 1.0
TongueIn: 1.0
TongueStretch: 1.0
TongueWide: 1.0
TongueNarrow: 1.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
TongueTipUp: 0.0
TongueTipDown: 0.0
TongueTipLeft: 0.0
TongueTipRight: 0.0
TongueRollUp: 0.0
TongueRollDown: 0.0
TongueRollLeft: 0.0
TongueRollRight: 0.0
TongueUp: 0.0
TongueDown: 0.0
TongueLeft: 0.0
TongueRight: 0.0
TongueIn: 0.0
TongueStretch: 0.0
TongueWide: 0.0
TongueNarrow: 0.0
active_poses: # Define which poses are active and which one are not
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
TongueTipUp: 1
TongueTipDown: 1
TongueTipLeft: 1
TongueTipRight: 1
TongueRollUp: 1
TongueRollDown: 1
TongueRollLeft: 1
TongueRollRight: 1
TongueUp: 1
TongueDown: 1
TongueLeft: 1
TongueRight: 1
TongueIn: 1
TongueStretch: 1
TongueWide: 1
TongueNarrow: 1
cancel_poses: # Define which poses cancel each other
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
symmetry_poses: # Define which poses are symmetric to each other
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# regression / diffusion
inference_type: regression
regression_model:
inference_model_id: james_v2.3.1
diffusion_model:
inference_model_id: multi_v3.2
identity: james
# If true, use deterministic noise for diffusion inference (more stable/repeatable results).
# If false, use non-deterministic noise (more variation between runs).
constant_noise: true
# Enable or disable tongue blendshapes output
enable_tongue_blendshapes: false
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
tongue_strength: 1.3 # Controls the range of motion of the tongue
tongue_height_offset: 0.0 # Controls the height of the tongue
tongue_depth_offset: 0.0 # Controls the depth of the tongue
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
# Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
enable_clamping_bs_weight: true
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 0.7
JawLeft: 0.2
JawRight: 0.2
JawOpen: 0.8
MouthClose: 0.3
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 0.2
MouthRight: 0.2
MouthSmileLeft: 1.2
MouthSmileRight: 1.2
MouthFrownLeft: 0.5
MouthFrownRight: 0.5
MouthDimpleLeft: 0.8
MouthDimpleRight: 0.8
MouthStretchLeft: 0.05
MouthStretchRight: 0.05
MouthRollLower: 0.8
MouthRollUpper: 0.5
MouthShrugLower: 1.0
MouthShrugUpper: 0.4
MouthPressLeft: 0.8
MouthPressRight: 0.8
MouthLowerDownLeft: 0.8
MouthLowerDownRight: 0.8
MouthUpperUpLeft: 0.8
MouthUpperUpRight: 0.8
BrowDownLeft: 1.2
BrowDownRight: 1.2
BrowInnerUp: 1.3
BrowOuterUpLeft: 0.8
BrowOuterUpRight: 0.8
CheekPuff: 0.2
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 0.8
NoseSneerRight: 0.8
TongueOut: 0.0
TongueTipUp: 1.0
TongueTipDown: 1.0
TongueTipLeft: 1.0
TongueTipRight: 1.0
TongueRollUp: 1.0
TongueRollDown: 1.0
TongueRollLeft: 1.0
TongueRollRight: 1.0
TongueUp: 1.0
TongueDown: 1.0
TongueLeft: 1.0
TongueRight: 1.0
TongueIn: 1.0
TongueStretch: 1.0
TongueWide: 1.0
TongueNarrow: 1.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
TongueTipUp: 0.0
TongueTipDown: 0.0
TongueTipLeft: 0.0
TongueTipRight: 0.0
TongueRollUp: 0.0
TongueRollDown: 0.0
TongueRollLeft: 0.0
TongueRollRight: 0.0
TongueUp: 0.0
TongueDown: 0.0
TongueLeft: 0.0
TongueRight: 0.0
TongueIn: 0.0
TongueStretch: 0.0
TongueWide: 0.0
TongueNarrow: 0.0
active_poses: # Define which poses are active and which one are not
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
TongueTipUp: 1
TongueTipDown: 1
TongueTipLeft: 1
TongueTipRight: 1
TongueRollUp: 1
TongueRollDown: 1
TongueRollLeft: 1
TongueRollRight: 1
TongueUp: 1
TongueDown: 1
TongueLeft: 1
TongueRight: 1
TongueIn: 1
TongueStretch: 1
TongueWide: 1
TongueNarrow: 1
cancel_poses: # Define which poses cancel each other
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
symmetry_poses: # Define which poses are symmetric to each other
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# regression / diffusion
inference_type: regression
regression_model:
inference_model_id: mark_v2.3
diffusion_model:
inference_model_id: multi_v3.2
identity: mark
# If true, use deterministic noise for diffusion inference (more stable/repeatable results).
# If false, use non-deterministic noise (more variation between runs).
constant_noise: true
# Enable or disable tongue blendshapes output
enable_tongue_blendshapes: false
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.3 # Controls the magnitude of the input audio
lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
skin_strength: 1.1 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
tongue_strength: 1.3 # Controls the range of motion of the tongue
tongue_height_offset: 0.0 # Controls the height of the tongue
tongue_depth_offset: 0.0 # Controls the depth of the tongue
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
# Clamps blendshape weights to [0.0, 1.0] range. Recommended for production to ensure compatibility with renderers.
enable_clamping_bs_weight: true
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 0.7
JawLeft: 0.2
JawRight: 0.2
JawOpen: 1.0
MouthClose: 0.2
MouthFunnel: 1.2
MouthPucker: 1.2
MouthLeft: 0.2
MouthRight: 0.2
MouthSmileLeft: 0.8
MouthSmileRight: 0.8
MouthFrownLeft: 0.5
MouthFrownRight: 0.5
MouthDimpleLeft: 0.8
MouthDimpleRight: 0.8
MouthStretchLeft: 0.05
MouthStretchRight: 0.05
MouthRollLower: 0.8
MouthRollUpper: 0.5
MouthShrugLower: 0.9
MouthShrugUpper: 0.4
MouthPressLeft: 0.8
MouthPressRight: 0.8
MouthLowerDownLeft: 0.8
MouthLowerDownRight: 0.8
MouthUpperUpLeft: 0.8
MouthUpperUpRight: 0.8
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 0.2
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 0.8
NoseSneerRight: 0.8
TongueOut: 0.0
TongueTipUp: 1.0
TongueTipDown: 1.0
TongueTipLeft: 1.0
TongueTipRight: 1.0
TongueRollUp: 1.0
TongueRollDown: 1.0
TongueRollLeft: 1.0
TongueRollRight: 1.0
TongueUp: 1.0
TongueDown: 1.0
TongueLeft: 1.0
TongueRight: 1.0
TongueIn: 1.0
TongueStretch: 1.0
TongueWide: 1.0
TongueNarrow: 1.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
TongueTipUp: 0.0
TongueTipDown: 0.0
TongueTipLeft: 0.0
TongueTipRight: 0.0
TongueRollUp: 0.0
TongueRollDown: 0.0
TongueRollLeft: 0.0
TongueRollRight: 0.0
TongueUp: 0.0
TongueDown: 0.0
TongueLeft: 0.0
TongueRight: 0.0
TongueIn: 0.0
TongueStretch: 0.0
TongueWide: 0.0
TongueNarrow: 0.0
active_poses: # Define which poses are active and which one are not
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
TongueTipUp: 1
TongueTipDown: 1
TongueTipLeft: 1
TongueTipRight: 1
TongueRollUp: 1
TongueRollDown: 1
TongueRollLeft: 1
TongueRollRight: 1
TongueUp: 1
TongueDown: 1
TongueLeft: 1
TongueRight: 1
TongueIn: 1
TongueStretch: 1
TongueWide: 1
TongueNarrow: 1
cancel_poses: # Define which poses cancel each other
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
symmetry_poses: # Define which poses are symmetric to each other
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
TongueTipUp: -1
TongueTipDown: -1
TongueTipLeft: -1
TongueTipRight: -1
TongueRollUp: -1
TongueRollDown: -1
TongueRollLeft: -1
TongueRollRight: -1
TongueUp: -1
TongueDown: -1
TongueLeft: -1
TongueRight: -1
TongueIn: -1
TongueStretch: -1
TongueWide: -1
TongueNarrow: -1
Parameter Tuning Guide#
Audio2Face-3D imports inference parameters from multiple sources: the inference model SDK, configuration files at deployment-time, and runtime input. Generally, parameters at deployment time override those matching in the model files, while runtime parameters override both deployment-time and model default parameters.
For runtime parameters please see AudioStreamHeader and FaceParameters, BlendShapeParameters, EmotionParameters, EmotionPostProcessingParameters for proto definitions.
FaceParameters
Only a subset of FaceParameters is supported for runtime tuning.
See FaceParameters for the list of supported ones.
Audio2Emotion (A2E) execution
The A2E component can run in two modes, controlled by the deployment-time stylization config:
a2e.enabled: true: the service runs Audio2Emotion inference on the incoming audio to produce emotions.a2e.enabled: false: the service skips audio-based (GPU) emotion inference and runs a post-processing path so that user-provided runtime emotions (preferred/animated emotions) can still be blended and smoothed.
Important
In Audio2Face-3D 2.0, setting a2e.enabled: false does not fully disable emotion processing.
The A2E execution thread continues to run and executes the A2E post-processing path continuously, so you may
still observe non-zero emotions if you provide runtime emotions (preferred/animated emotions) or use non-neutral
defaults.
If your goal is “no emotion drive”, ensure you also:
keep
default_beginning_emotionsat all zerosdo not send runtime emotion inputs
optionally set
a2e.post_processing_params.enable_preferred_emotion: falseand/ora2e.post_processing_params.emotion_strength: 0.0
The A2E execution frequency is controlled by a2e.inference_interval in advanced_config.yaml.
Higher values run A2E inference less frequently (lower compute cost), at the expense of temporal fidelity.
Emotion Post-processing Parameters
Audio2Emotion SDK automatically parses emotions from the incoming audio and generates emotion vectors to drive the character’s facial animation performance. Use the post processing parameters below to further tailor the performance to your desired specifications. Note that the order of operations listed below is the specific sequence in which the processes are executed in the technology stack.
Emotion Contrast
Emotion contrast is applied to the inference output, controlling the emotion spread using the sigmoid function. This adjustment pushes the higher and lower values, allowing for a wider range in the generated emotional performance.
Max Emotions
Max emotions allows the user to set a hard limit on the number of emotions that Audio2Emotion SDK will engage. Emotions are prioritized by their strength. Once the maximum number of emotions is reached, only vectors for these prioritized emotions will be engaged, and all other emotions will be null. This helps achieve a more accurate read on the correct emotion when the vocal emotional performance is more subtle
For example - if Joy and Amazement are the strongest predicted emotions, and you set the Max Emotions limit to 2, only Joy and Amazement will be applied to the performance.
Emotion index conversion
Emotion index conversion uses emotion correspondence to remap emotions from Audio2Emotion to Audio2Face SDKs.
Smoothing
Uses a live blend coefficient to do an exponential smoothing on the remapped emotions.
Blend Preferred Emotion
The preferred emotion (manual emotion) and the inference emotion output are combined to generate a composite final output of all emotion data.
Transition smoothing
Transition smoothing applies an exponential smoothing to the final emotion values. (the composite of Audio2Emotion + preferred emotion)
Emotion Strength
This controls the overall emotion strength of the final emotion composite from the previous emotion processes. A multiplier to the final emotion result. (Audio2Emotion + preferred)
Preferred Emotion
Use the emotion sliders to create a preferred (manual) emotion pose as the base emotion for the character animation. The preferred emotion is taken from the current settings in the Emotion widget and is blended with the generated emotions throughout the animation.
Blendshape parameters
Currently, the default blendshape parameters included in the model data are tuned for use with Metahuman avatars.
For our default avatars (Claire, Mark, Ben), all 52 values of weight_multipliers in the stylization config should be set to 1.0.
Blendshape Clamping
The enable_clamping_bs_weight parameter controls whether blendshape weight values are constrained to the
range [0.0, 1.0]. Blendshape clamping is a post-processing step that ensures blendshape weights stay within
the standard range expected by most animation systems. The A2F neural network can produce values outside this
range, so clamping normalizes them for compatibility with downstream renderers.
The clamping is applied after multipliers and offsets are applied.
Setting |
Behavior |
|---|---|
|
Values guaranteed 0.0-1.0. Safe for renderers expecting normalized weights. Recommended for production. |
|
Values can exceed range (e.g., 1.2, -0.1). Preserves full model output fidelity. Useful for debugging/analysis. |
Note
Tongue blendshape parameters are available when tongue output is enabled (enable_tongue_blendshapes: true).
In that case, the service outputs 68 blendshape weights (52 face + 16 tongue) and the runtime
BlendShapeParameters maps can include the tongue keys in addition to TongueOut.
Environment variables#
The following table describes the environment variables that can be passed to Audio2Face-3D NIM as a -e argument
added to a docker run command:
Variable |
Required |
Values |
Notes |
|---|---|---|---|
NGC_API_KEY |
No |
Any string representing a valid NGC API Key |
Required only if you want to download TRT engines from NGC. You must set this variable to the value of your personal NGC API key. |
NIM_LOGGING_JSONL |
No |
true / false |
Enables (true) or disables (false) JSON Lines format logging to stdout. |
NIM_MANIFEST_PROFILE |
No |
Any valid manifest profile string |
Choose the manifest profile id from Supported Models for your GPU. |
NIM_DISABLE_MODEL_DOWNLOAD |
No |
true / false |
Disables (true) or enables (false) automatic TRT engine downloads from NGC. When set to ‘true’, automatic downloads are prevented and TRT engines will be generated locally instead. If pre-cached models are mounted, local generation will be skipped. Note that TRT generation fails on RTX 50 series for now. |
NIM_SKIP_A2F_START |
No |
true / false |
If set to true, the container will not start the A2F-3D service at startup. |
Volumes#
The following table describes the paths inside the container into which the local paths can be mounted. For example, you
can mount a volume with the following docker flag -v {LOCAL_PATH}:{PATH_IN_CONTAINER}.
Container path |
Required |
Notes |
|---|---|---|
/tmp/a2x/ |
Not required, but if this volume is not mounted, the container will have to do a fresh download or generation of the model each time it is brought up |
Path for AI models. Must have execute, read and write permissions or 777. |
/mnt/configs/ |
Needed only in the case where you want to override some configuration parameters |
Path for files to override configs |
Quick Deployment of Audio2Face-3D Microservices#
Instead of deploying the Audio2Face-3D and manually starting the model, you can quickly deploy them together using the docker-compose file following the quick-start instructions provided in the NVIDIA Audio2Face-3D Samples repo.