Audio2Face-3D NIM Container Deployment and Configuration Guide#
This guide provides comprehensive instructions for deploying, configuring, and running the Audio2Face-3D NIM Docker container available through the NGC registry.
Before proceeding, please review the Architecture Overview page to understand the core concepts, services, and requirements for running Audio2Face-3D.
Audio2Face-3D offers extensive configuration capabilities through configuration files and environment variables, which can be customized via a custom entrypoint.
Prerequisites#
To run the microservice, you will need:
Access to the NGC Docker registry
Active login to the nvcr.io registry
NVIDIA Container Toolkit configured with Docker
For detailed hardware and software requirements, consult the Support Matrix page.
Configuration files#
Audio2Face-3D utilizes three distinct configuration file types, each targeting specific user roles:
Stylization Configuration (Artist-focused): Parameters typically adjusted by artists for creative control
Deployment Configuration (DevOps-focused): Parameters related to deployment and infrastructure
Advanced Configuration (Expert-focused): Specialized parameters for specific use cases
Warning
These deployment-time configuration files differ from runtime configuration files in both case convention (snake_case vs. camelCase) and structure. For reference, see this runtime configuration example: config_james.yml.
1. Stylization Configuration Files#
The system provides three variant-specific configuration files:
Claire
James
Mark
Each variant corresponds to a specific AI Model with predefined default values. The James configuration serves as the default for the Microservice.
Claire Configuration#
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true # Enable audio2emotion, ai-generated audio-driven emotion
live_transition_time: 0.5 # Controls the smoothness of the output transition toward the target value across frames; higher values result in smoother transitions. Each frame updates at a rate of <frame time length> / <live transition time> (capped at 1.0) toward the raw result.
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: claire_v2.3
blendshape_id: claire_topo1_v2.1
tongue_blendshape_id: claire_tongue_v1.0
enable_tongue_blendshapes: true
face_params:
eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
# Multiplier for each blendshape output. This list depends on the blendshape model.
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
# Constant offset for each blendshape output. This list depends on the blendshape model.
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
James Configuration#
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: james_v2.3
blendshape_id: james_topo2_v2.2
tongue_blendshape_id: james_tongue_v1.0
enable_tongue_blendshapes: true
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
Mark Configuration#
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: mark_v2.3
blendshape_id: mark_topo1_v2.1
tongue_blendshape_id: mark_tongue_v1.0
enable_tongue_blendshapes: true
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.3 # Controls the magnitude of the input audio
lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
skin_strength: 1.1 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
2. Deployment Configuration File#
deployment_config.yaml
common:
# Number of stream to use simultaneously
# The recommended value depends on the gpu and your latency constraints
# Higher value means: more concurrent users and higher overall throughput
# Lower value means: less concurrent users, higher throughput per stream, lower latencies
stream_number: 10
# Pad each audio file with some 1.5 seconds of silent audio
add_silence_padding_after_audio: false
logging:
# Level of log wanted, info is recommended
# Can be one of:
# => trace
# => debug
# => info
# => warn
# => err
# => critical
# => off
log_level: info
# How often should FPS logs be printed per stream
fps_logging_interval_second: 1
endpoints:
# use the bidirectional endpoint instead of 2 connections (server to receive audio + client to send animation data)
use_bidirectional: true
# server to perform the bidirectional streaming connection
# Used only if use_bidirectional_endpoint==true
bidirectional:
server:
# port to open
url: 0.0.0.0:52000
# secure mode
# Can be one of:
# => disabled
# => tls
# => mtls
secure_mode: "disabled"
# Path to the certificate file
certificate_path: ""
# Path to the key file
key_path: ""
# Path to the CA file
ca_path: ""
unidirectional:
# Server that receives the audio
# Used only if use_bidirectional_endpoint==false
server:
# port to open
url: 0.0.0.0:50000
# Client that sends the animation data
# Used only if use_bidirectional_endpoint==false
client:
# url of the server to contact
url: 0.0.0.0:51000
# Configs specific to telemetry
telemetry:
# Name of the service
service_name: audio2face
# Whether to enable metrics
metrics_enabled: false
# Whether to enable traces
traces_enabled: false
# Can be prometheus or otlp
metrics_exporter: prometheus
# Export interval in milliseconds
otel_metric_export_interval: 60000
# Export timeout in milliseconds
otel_metric_export_timeout: 30000
otlp_http_metrics_endpoint: http://localhost:4318/v1/metrics
otlp_http_traces_endpoint: http://localhost:4318/v1/traces
prometheus_endpoint: 0.0.0.0:9464
3. Advanced Configuration File#
advanced_config.yaml
input_sanitization:
# max size of UUID
max_len_uuid: 50
# Maximum samplerate
max_sample_rate: 144000
# Minimum samplerate
min_sample_rate: 16000
# Maximum amount in second for the processing time
# After this timeout the connection to A2F will be cut
max_processing_duration_second: 300
# Maximum size of 1 audio buffer sent over the grpc stream
max_audio_buffer_size_second: 10
# Maximum size of the audio clip to process
max_audio_clip_size_second: 300
# Maximum amount of time that A2F Controller will wait when not
# receiving data from A2F, before cutting the connection
max_wait_time_idle_ms: 30000
# Will stop serving a user if their fps a lower than low_fps
# for more than low_fps_max_duration_second seconds
# For real time application less than 30 FPS means slower than realtime
# So if users provide audio to the service at less than 30 FPS then
# the interactive experience will stutter.
low_fps: 29
low_fps_max_duration_second: 7
garbage_collector:
# enable or disable the garbage collector
# This is only used with bidirectional connection where the service is holding data
# waiting for the client to pick them up.
enabled: true
# how often the garbage collector should run
interval_run_second: 10
# If the garbage collector finds streams holding
# more than N seconds of data, it will delete data
# until the amount falls below this threshold.
# Clients are expected to retrieve data promptly so that
# the service doesn't retain the data excessively.
max_size_stored_data_second: 60
pipeline_parameters:
# Queues between pipeline components
# Can be tweaked:
# Higher values can lead to higher throughput but leads to higher latencies
# Lower values leads to lower latencies; and potentially lower overall throughput
# Leave these values to default in case of doubt
queue_size_after_a2e: 1
queue_size_after_a2f: 300
queue_size_after_streammux: 1
streammux:
# Do not change this config; this is internal
adaptive_batching: 0
# Minimum FPS for all streams
# Pipeline will not slow down under this value if:
# * compute allows it
# * upload speed of audio allows it
# Here 40 FPS
# Numerator for that config:
overall_min_fps_n: 40
# Denominator for that config:
overall_min_fps_d: 1
a2f:
# Remove temporal smoothing
# used for debugging individual frames generated
temporal_smoothing: true
device_id: 0 # Which gpu id to use
a2e:
inference_interval: 10
device_id: 0 # Which gpu id to use
trt_model_generation:
a2e:
precision: "fp16"
min_shape: 1
optimal_shape: 10
maximum_shape: 128
a2f:
precision: "fp16"
min_shape: 1
optimal_shape: 10
maximum_shape: 128
These configuration files represent the system’s default values. To implement custom configurations, launch A2F-3D NIM with a custom endpoint and mount your configuration files within the container as detailed in the following sections.
Configuration Usage#
To override default configurations, mount your custom configuration files in a Docker volume at /mnt/configs
.
For convenience, set up your environment with these commands:
$ mkdir -p ~/.cache/audio2face-3d-configs
$ export LOCAL_CONFIGS=~/.cache/audio2face-3d-configs
Copy the default configurations to your LOCAL_CONFIGS
directory:
$ ls $LOCAL_CONFIGS
advanced_config.yaml
claire_stylization_config.yaml
deployment_config.yaml
james_stylization_config.yaml
mark_stylization_config.yaml
Model Cache Management#
Enable local model caching to optimize subsequent service launches. Configure a cache location with appropriate permissions
as shown below. Also note, the NIM_DISABLE_MODEL_DOWNLOAD
must be set to true as part of the docker run command
in order to use cached models properly. This is explained in detail in the Model Caching section of
Getting Started.
$ mkdir -p ~/.cache/audio2face-3d
$ chmod 777 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d
Launching A2F-3D NIM with Custom Entrypoint#
$ docker run -it --rm --name audio2face-3d \
--gpus all \
--network=host \
--entrypoint /bin/bash -w /opt/nvidia/a2f_pipeline \
-e NIM_SKIP_A2F_START=true \
-e NIM_DISABLE_MODEL_DOWNLOAD=true \
-e NGC_API_KEY=$NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/tmp/a2x" \
-v "$LOCAL_CONFIGS:/mnt/configs/" \
nvcr.io/nim/nvidia/audio2face-3d:1.3
This command creates a Docker container with GPU support (--gpus all
) and host network access (--network=host
).
For granular port control, replace --network=host
with specific port mappings using -p
.
The command mounts volumes for model caching (-v "$LOCAL_NIM_CACHE:/tmp/a2x"
) and
configuration overrides (-v "$LOCAL_CONFIGS:/mnt/configs/"
). It also stops the download
of TRT engines (-e NIM_DISABLE_MODEL_DOWNLOAD=true
) from NGC. Omit either mount if the corresponding functionality isn’t needed.
Once inside the container shell:
triton-server@host-name:/opt/nvidia/a2f_pipeline$
Launch the NIM server:
$ /opt/nim/start_server.sh &
The ampersand (&) runs the server as a background process. Press Enter if needed to return to the shell prompt.
Note
The following commands should be executed within the container unless specified otherwise.
TensorRT Engine Generation#
Generate the TensorRT engine for your GPU using the provided Python application:
usage: generate_trt_models.py [-h] [--stylization-config STYLIZATION_CONFIG] [--advanced-config ADVANCED_CONFIG]
Generates TRT models for A2F Service.
options:
-h, --help show this help message and exit
--stylization-config STYLIZATION_CONFIG
file path to the stylization config
--advanced-config ADVANCED_CONFIG
file path to the advanced config
Generate Audio2Emotion and Audio2Face TRT engines with default configurations:
$ ./service/generate_trt_models.py
Note
TRT engines are GPU-specific and must be regenerated when changing deployment hardware. While generated engines can be backed up, they’re only compatible with identical hardware configurations.
Service Initialization#
Launch the Audio2Face-3D Service:
$ a2f_pipeline.run -h
Usage: a2f_pipeline.run [--help] [--version] [--stylization-config] [--deployment-config] [--advanced-config]
Optional arguments:
-h, --help shows help message and exits
-v, --version prints version information and exits
--stylization-config file path to the stylization config
--deployment-config file path to the deployment config
--advanced-config file path to the advanced config
Start with default configuration:
$ /usr/local/bin/a2f_pipeline.run
Successful initialization produces:
[2024-04-23 12:44:33.066] [ global ] [info] Running...
Note
GStreamer-WARNING messages may appear during startup. These warnings relate to missing container libraries not used by Audio2Face-3D and can be safely ignored.
Streamlined Configuration Updates#
To switch to the Claire model, execute these commands within the container:
$ ./service/generate_trt_models.py --stylization-config /mnt/configs/claire_stylization_config.yaml \
--advanced-config /mnt/configs/advanced_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/claire_stylization_config.yaml \
--deployment-config /mnt/configs/deployment_config.yaml \
--advanced-config /mnt/configs/advanced_config.yaml
Warning
The current generate_trt_models.py utility doesn’t support cache invalidation. To regenerate models after configuration updates, manually remove the corresponding TRT model from /tmp/a2x/.
Flexible Configuration Management#
The configuration system employs an override mechanism, allowing partial configuration updates without specifying all parameters. For the a2f-3d section of stylization configuration, specifying an inference_model_id automatically loads corresponding default face parameters, while blendshape_id loads default blendshape parameters. Additionally, the tongue_blendshape_id loads the corresponding tongue parameters.
Example 1: Using Mark Stylization#
Create short_mark_stylization_config.yaml in $LOCAL_CONFIGS with:
a2f:
inference_model_id: mark_v2.3
blendshape_id: mark_topo1_v2.1
tongue_blendshape_id: mark_tongue_v1.0 # optional if you want tongue animation
enable_tongue_blendshapes: true # optional if you want tongue animation
Execute within the container:
$ ./service/generate_trt_models.py --stylization-config /mnt/configs/short_mark_stylization_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/short_mark_stylization_config.yaml
Warning
The current generate_trt_models.py utility doesn’t support cache invalidation. To regenerate models after configuration updates, manually remove the corresponding TRT model from /tmp/a2x/.
This command produces the same result as using the complete Mark configuration file, since the system automatically loads default parameters based on the specified inference_model_id and blendshape_id.
Example 2: Updating the type of Endpoint to Unidirectional#
This example demonstrates how to modify the endpoint communication mode in deployment_config.yaml.
Create a file named unidirectional_deployment_config.yaml in your $LOCAL_CONFIGS directory with:
endpoints:
use_bidirectional: false
Execute these commands in the container:
$ ./service/generate_trt_models.py
$ a2f_pipeline.run --deployment-config /mnt/configs/unidirectional_deployment_config.yaml
This configuration switches the communication mode from bidirectional (two-way) to unidirectional (one-way). The override mechanism demonstrated here works for any valid configuration key in the YAML files.
Warning
Use the appropriate configuration flag for your file type:
--stylization-config # for the <any>_stylization_config.yaml
--deployment-config # for the deployment_config.yaml
--advanced-config # for the advanced_config.yaml
Advanced Stylization#
The above stylization configuration blendshape tuning was simplified for new users.
For advanced users, a section is available below.
Advanced Blendshape tuning
3 more parameters can be set for blendshape tuning:
active_poses: Which Blendshapes should be active. 1 for active; 0 for inactive
cancel_poses: Which Blendshape cancel each other; matching number indicate which one matches which; -1 noop
symmetry_poses: Which Blendshape is symmetric to another one; matching number indicate which one matches which; -1 noop
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: claire_v2.3
blendshape_id: claire_topo1_v2.1
tongue_blendshape_id: claire_tongue_v1.0
enable_tongue_blendshapes: true
face_params:
eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses:
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses:
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses:
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: james_v2.3
blendshape_id: james_topo2_v2.2
tongue_blendshape_id: james_tongue_v1.0
enable_tongue_blendshapes: true
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses:
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses:
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses:
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: mark_v2.3
blendshape_id: mark_topo1_v2.1
tongue_blendshape_id: mark_tongue_v1.0
enable_tongue_blendshapes: true
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.3 # Controls the magnitude of the input audio
lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
skin_strength: 1.1 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses:
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses:
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses:
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
Configuration files for Unreal Engine Metahuman#
If you plan to connect A2F-3D with MetaHuman characters then you will need to use configuration files adapted for them. The only changes for these configuration files compared to the default configuration files are the blendshape multipliers and offsets
MetaHuman Stylization Configuration Files
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: claire_v2.3
blendshape_id: claire_topo1_v2.1
tongue_blendshape_id: claire_tongue_v1.0
enable_tongue_blendshapes: true
face_params:
eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 0.7
JawLeft: 0.2
JawRight: 0.2
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.2
MouthPucker: 1.2
MouthLeft: 0.2
MouthRight: 0.2
MouthSmileLeft: 0.8
MouthSmileRight: 0.8
MouthFrownLeft: 0.4
MouthFrownRight: 0.4
MouthDimpleLeft: 0.7
MouthDimpleRight: 0.7
MouthStretchLeft: 0.1
MouthStretchRight: 0.1
MouthRollLower: 0.9
MouthRollUpper: 0.5
MouthShrugLower: 0.9
MouthShrugUpper: 0.4
MouthPressLeft: 0.8
MouthPressRight: 0.8
MouthLowerDownLeft: 0.8
MouthLowerDownRight: 0.8
MouthUpperUpLeft: 0.8
MouthUpperUpRight: 0.8
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 0.2
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 0.8
NoseSneerRight: 0.8
TongueOut: 0.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses: # Define which poses are active and which one are not
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses: # Define which poses cancel each other
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses: # Define which poses are symmetric to each other
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: james_v2.3
blendshape_id: james_topo2_v2.2
tongue_blendshape_id: james_tongue_v1.0
enable_tongue_blendshapes: true
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 0.7
JawLeft: 0.2
JawRight: 0.2
JawOpen: 0.8
MouthClose: 0.3
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 0.2
MouthRight: 0.2
MouthSmileLeft: 1.2
MouthSmileRight: 1.2
MouthFrownLeft: 0.5
MouthFrownRight: 0.5
MouthDimpleLeft: 0.8
MouthDimpleRight: 0.8
MouthStretchLeft: 0.05
MouthStretchRight: 0.05
MouthRollLower: 0.8
MouthRollUpper: 0.5
MouthShrugLower: 1.0
MouthShrugUpper: 0.4
MouthPressLeft: 0.8
MouthPressRight: 0.8
MouthLowerDownLeft: 0.8
MouthLowerDownRight: 0.8
MouthUpperUpLeft: 0.8
MouthUpperUpRight: 0.8
BrowDownLeft: 1.2
BrowDownRight: 1.2
BrowInnerUp: 1.3
BrowOuterUpLeft: 0.8
BrowOuterUpRight: 0.8
CheekPuff: 0.2
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 0.8
NoseSneerRight: 0.8
TongueOut: 0.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses: # Define which poses are active and which one are not
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses: # Define which poses cancel each other
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses: # Define which poses are symmetric to each other
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: mark_v2.3
blendshape_id: mark_topo1_v2.1
tongue_blendshape_id: mark_tongue_v1.0
enable_tongue_blendshapes: true
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.3 # Controls the magnitude of the input audio
lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
skin_strength: 1.1 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 0.7
JawLeft: 0.2
JawRight: 0.2
JawOpen: 1.0
MouthClose: 0.2
MouthFunnel: 1.2
MouthPucker: 1.2
MouthLeft: 0.2
MouthRight: 0.2
MouthSmileLeft: 0.8
MouthSmileRight: 0.8
MouthFrownLeft: 0.5
MouthFrownRight: 0.5
MouthDimpleLeft: 0.8
MouthDimpleRight: 0.8
MouthStretchLeft: 0.05
MouthStretchRight: 0.05
MouthRollLower: 0.8
MouthRollUpper: 0.5
MouthShrugLower: 0.9
MouthShrugUpper: 0.4
MouthPressLeft: 0.8
MouthPressRight: 0.8
MouthLowerDownLeft: 0.8
MouthLowerDownRight: 0.8
MouthUpperUpLeft: 0.8
MouthUpperUpRight: 0.8
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 0.2
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 0.8
NoseSneerRight: 0.8
TongueOut: 0.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses: # Define which poses are active and which one are not
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses: # Define which poses cancel each other
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses: # Define which poses are symmetric to each other
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
Parameter Tuning Guide#
Audio2Face-3D imports inference parameters from multiple sources: the inference model SDK, configuration files at deployment-time, and runtime input. Generally, parameters at deployment time override those matching in the model files, while runtime parameters override both deployment-time and model default parameters.
For runtime parameters please see AudioStreamHeader and FaceParameters, BlendShapeParameters, EmotionParameters, EmotionPostProcessingParameters for proto definitions.
FaceParameters
Only a subset of FaceParameters
is supported for runtime tuning.
See FaceParameters for the list of supported ones.
Emotion Post-processing Parameters
Audio2Emotion SDK automatically parses emotions from the incoming audio and generates emotion vectors to drive the character’s facial animation performance. Use the post processing parameters below to further tailor the performance to your desired specifications. Note that the order of operations listed below is the specific sequence in which the processes are executed in the technology stack.
Emotion Contrast
Emotion contrast is applied to the inference output, controlling the emotion spread using the sigmoid function. This adjustment pushes the higher and lower values, allowing for a wider range in the generated emotional performance.
Max Emotions
Max emotions allows the user to set a hard limit on the number of emotions that Audio2Emotion SDK will engage. Emotions are prioritized by their strength. Once the maximum number of emotions is reached, only vectors for these prioritized emotions will be engaged, and all other emotions will be null. This helps achieve a more accurate read on the correct emotion when the vocal emotional performance is more subtle
For example - if Joy and Amazement are the strongest predicted emotions, and you set the Max Emotions limit to 2, only Joy and Amazement will be applied to the performance.
Emotion index conversion
Emotion index conversion uses emotion correspondence to remap emotions from Audio2Emotion to Audio2Face SDKs.
Smoothing
Uses a live blend coefficient to do an exponential smoothing on the remapped emotions.
Blend Preferred Emotion
The preferred emotion (manual emotion) and the inference emotion output are combined to generate a composite final output of all emotion data.
Transition smoothing
Transition smoothing applies an exponential smoothing to the final emotion values. (the composite of Audio2Emotion + preferred emotion)
Emotion Strength
This controls the overall emotion strength of the final emotion composite from the previous emotion processes. A multiplier to the final emotion result. (Audio2Emotion + preferred)
Preferred Emotion
Use the emotion sliders to create a preferred (manual) emotion pose as the base emotion for the character animation. The preferred emotion is taken from the current settings in the Emotion widget and is blended with the generated emotions throughout the animation.
Blendshape parameters
Currently, the default blendshape parameters included in the model data are tuned for use with Metahuman avatars.
For our default avatars (Claire, Mark, Ben), all 52 values of weight_multipliers
in the stylization config should be set to 1.0.
..note
Tongue blendshape parameters are not available at the moment.
Environment variables#
The following table describes the environment variables that can be passed to Audio2Face-3D NIM as a -e
argument
added to a docker run command:
Variable |
Required |
Values |
Notes |
---|---|---|---|
NGC_API_KEY |
No |
Any string representing a valid NGC API Key |
Required only if you want to download TRT engines from NGC. You must set this variable to the value of your personal NGC API key. |
NIM_LOGGING_JSONL |
No |
true / false |
Enables (true) or disables (false) JSON Lines format logging to stdout. |
NIM_MANIFEST_PROFILE |
No |
Any valid manifest profile string |
Choose the manifest profile id from Supported Models for your GPU. |
NIM_DISABLE_MODEL_DOWNLOAD |
No |
true / false |
Disables (true) or enables (false) automatic TRT engine downloads from NGC. When set to ‘true’, automatic downloads are prevented and TRT engines will be generated locally instead. If pre-cached models are mounted, local generation will be skipped. Note that TRT generation fails on RTX 50 series for now. |
NIM_SKIP_A2F_START |
No |
true / false |
If set to true, the container will not start the A2F-3D service at startup. |
Volumes#
The following table describes the paths inside the container into which the local paths can be mounted. For example, you
can mount a volume with the following docker flag -v {LOCAL_PATH}:{PATH_IN_CONTAINER}
.
Container path |
Required |
Notes |
---|---|---|
/tmp/a2x/ |
Not required, but if this volume is not mounted, the container will have to do a fresh download or generation of the model each time it is brought up |
Path for AI models. Must have execute, read and write permissions or 777. |
/mnt/configs/ |
Needed only in the case where you want to override some configuration parameters |
Path for files to override configs |
Quick Deployment of Audio2Face-3D Microservices#
Instead of deploying the Audio2Face-3D and manually starting the model, you can quickly deploy them together using the docker-compose file following the quick-start instructions provided in the NVIDIA Audio2Face-3D Samples repo.