A2F-3D NIM Manual Container Deployment and Configuration#
We offer a Docker container via the NGC registry for deployment purposes. This guide will demonstrate how to deploy, configure and run the Docker image for Audio2Face-3D NIM.
Before proceeding, it is essential to familiarize yourself with the concepts, services, and requirements necessary to run Audio2Face-3D by reading the Architecture Overview page.
Audio2Face-3D is highly configurable through configuration files and environment variables. To configure Audio2Face-3D you will need to use a custom entrypoint.
Prerequisites#
In order to run the microservice you will need access to the NGC Docker registry.
Make sure you have an NVAIE Access, your Personal Key and that you are logged in to the nvcr.io registry.
You will also need the NVIDIA Container Toolkit configured with Docker.
For more information about hardware and software requirements, visit the Support Matrix page.
Configuration files#
There are 3 kinds for A2F-3D configuration files. Each of these configuration files corresponds to a specific type of user.
An Artist: The stylization config contains configuration parameters specific to only what an Artist would tweak.
A Devops: The deployment config contains configuration parameters specific to only what a Devops would need to think about.
An Advanced User: The rest of the configuration parameters are more rarely updated. But it is necessary to have them for specific scenario.
Warning
These configuration files are deployment-time configuration files. Although they look similar to the runtime ones, runtime and deployment-time configuration files should not be mistaken. There is a case difference between the 2 configuration files (snake_case VS camelCase) and structure difference. Here is an example of runtime configuration: config_james.yml.
1. The Stylization configuration file#
There are 3 variants of this configuration file:
Claire
James
Mark
They each correspond to a specific AI Model and contain the default values that are going to be used. By default James’s configuration is used by the Microservice.
Claire Config#
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip, and it also defines the default preferred emotion.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true # Enable audio2emotion, ai-generated audio-driven emotion
live_transition_time: 0.5 # Controls the smoothness of the output transition toward the target value across frames; higher values result in smoother transitions. Each frame updates at a rate of <frame time length> / <live transition time> (capped at 1.0) toward the raw result.
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: claire_v2.3
blendshape_id: claire_topo1_v2.1
face_params:
eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
# Multiplier for each blendshape output. This list depends on the blendshape model.
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
# Constant offset for each blendshape output. This list depends on the blendshape model.
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
James Config#
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: james_v2.3
blendshape_id: james_topo2_v2.2
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
Mark Config#
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: mark_v2.3
blendshape_id: mark_topo1_v2.1
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.3 # Controls the magnitude of the input audio
lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
skin_strength: 1.1 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
2. The Deployment configuration file#
deployment_config.yaml
common:
# Number of stream to use simultaneously
# The recommended value depends on the gpu and your latency constraints
# Higher value means: more concurrent users and higher overall throughput
# Lower value means: less concurrent users, higher throughput per stream, lower latencies
stream_number: 10
# Pad each audio file with some 1.5 seconds of silent audio
add_silence_padding_after_audio: false
logging:
# Level of log wanted, info is recommended
# Can be one of:
# => trace
# => debug
# => info
# => warn
# => err
# => critical
# => off
log_level: info
# How often should FPS logs be printed per stream
fps_logging_interval_second: 1
endpoints:
# use the bidirectional endpoint instead of 2 connections (server to receive audio + client to send animation data)
use_bidirectional: true
# server to perform the bidirectional streaming connection
# Used only if use_bidirectional_endpoint==true
bidirectional:
server:
# port to open
url: 0.0.0.0:52000
unidirectional:
# Server that receives the audio
# Used only if use_bidirectional_endpoint==false
server:
# port to open
url: 0.0.0.0:50000
# Client that sends the animation data
# Used only if use_bidirectional_endpoint==false
client:
# url of the server to contact
url: 0.0.0.0:51000
# Configs specific to telemetry
telemetry:
# Name of the service
service_name: audio2face
# Whether to enable metrics
metrics_enabled: false
# Whether to enable traces
traces_enabled: false
# Can be prometheus or otlp
metrics_exporter: prometheus
# Export interval in milliseconds
otel_metric_export_interval: 60000
# Export timeout in milliseconds
otel_metric_export_timeout: 30000
otlp_http_metrics_endpoint: http://localhost:4318/v1/metrics
otlp_http_traces_endpoint: http://localhost:4318/v1/traces
prometheus_endpoint: 0.0.0.0:9464
3. The Advanced configuration file#
advanced_config.yaml
input_sanitization:
# max size of UUID
max_len_uuid: 50
# Maximum samplerate
max_sample_rate: 144000
# Minimum samplerate
min_sample_rate: 16000
# Maximum amount in second for the processing time
# After this timeout the connection to A2F will be cut
max_processing_duration_second: 300
# Maximum size of 1 audio buffer sent over the grpc stream
max_audio_buffer_size_second: 10
# Maximum size of the audio clip to process
max_audio_clip_size_second: 300
# Maximum amount of time that A2F Controller will wait when not
# receiving data from A2F, before cutting the connection
max_wait_time_idle_ms: 30000
# Will stop serving a user if their fps a lower than low_fps
# for more than low_fps_max_duration_second seconds
# For real time application less than 30 FPS means slower than realtime
# So if users provide audio to the service at less than 30 FPS then
# the interactive experience will stutter.
low_fps: 29
low_fps_max_duration_second: 7
garbage_collector:
# enable or disable the garbage collector
# This is only used with bidirectional connection where the service is holding data
# waiting for the client to pick them up.
enabled: true
# how often the garbage collector should run
interval_run_second: 10
# If the garbage collector finds streams holding
# more than N seconds of data, it will delete data
# until the amount falls below this threshold.
# Clients are expected to retrieve data promptly so that
# the service doesn't retain the data excessively.
max_size_stored_data_second: 60
pipeline_parameters:
# Queues between pipeline components
# Can be tweaked:
# Higher values can lead to higher throughput but leads to higher latencies
# Lower values leads to lower latencies; and potentially lower overall throughput
# Leave these values to default in case of doubt
queue_size_after_a2e: 1
queue_size_after_a2f: 300
queue_size_after_streammux: 1
streammux:
# Do not change this config; this is internal
adaptive_batching: 0
# Minimum FPS for all streams
# Pipeline will not slow down under this value if:
# * compute allows it
# * upload speed of audio allows it
# Here 40 FPS
# Numerator for that config:
overall_min_fps_n: 40
# Denominator for that config:
overall_min_fps_d: 1
a2f:
# Remove temporal smoothing
# used for debugging individual frames generated
temporal_smoothing: true
device_id: 0 # Which gpu id to use
a2e:
inference_interval: 10
device_id: 0 # Which gpu id to use
trt_model_generation:
a2e:
precision: "fp16"
min_shape: 1
optimal_shape: 10
maximum_shape: 128
a2f:
precision: "fp16"
min_shape: 1
optimal_shape: 10
maximum_shape: 128
The above configuration files represent the default values that are being used by the Microservice.
To apply your own configuration, start the A2F-3D NIM with a custom endpoint and mount your configuration files inside the container. Refer to the next section of the guide for detailed instructions.
How to use configuration files#
To override any of the configurations, you need to mount the files in a Docker volume in /mnt/configs
path.
For convenience, export an environment variable with the path where the overridden configurations are called LOCAL_CONFIGS
.
To do so, you can follow the example instructions below:
$ mkdir -p ~/.cache/audio2face-3d-configs
$ export LOCAL_CONFIGS=~/.cache/audio2face-3d-configs
You need to make a copy of the above configs and place them in, LOCAL_CONFIGS
directory.
Then you will have:
$ ls $LOCAL_CONFIGS
advanced_config.yaml
claire_stylization_config.yaml
deployment_config.yaml
james_stylization_config.yaml
mark_stylization_config.yaml
Model caching#
You can cache the model locally so next time you run the service you don’t have to generate or download it. To cache the model,
use a Docker volume mount. Make sure the local path has execute, read and write permissions (777
permissions).
You can these instructions to setup a local path where to cache models with the right permissions:
$ mkdir -p ~/.cache/audio2face-3d
$ chmod 777 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d
When you run the second time, if you want to run the NIM entrypoint instead of a custom one, make sure to disable model
caching by setting the environment variable NIM_DISABLE_MODEL_DOWNLOAD=true
.
Starting the A2F-3D NIM with custom entrypoint#
$ docker run -it --rm --name audio2face-3d \
--gpus all \
--network=host \
--entrypoint /bin/bash -w /opt/nvidia/a2f_pipeline \
-e NIM_DISABLE_MODEL_DOWNLOAD=true \
-e NIM_SKIP_A2F_START=true \
-v "$LOCAL_NIM_CACHE:/tmp/a2x" \
-v "$LOCAL_CONFIGS:/mnt/configs/" \
nvcr.io/nim/nvidia/audio2face-3d:1.2
The above command will create a Docker container running the Audio2face-3D NIM. Notice the --gpus all
is specified
to bridge a GPU to the Docker container. You can customize this option with your preferences.
We also use --network=host
to bind all ports on local network. If you want thin control over port binding, use the
-p
directive instead with the appropriate ports.
Notice the two volume mounts for caching the model -v "$LOCAL_NIM_CACHE:/tmp/a2x"
and for overriding
configurations -v "$LOCAL_CONFIGS:/mnt/configs/"
. Skip each volume mount if you don’t want to change the default
configuration or cache the model respectively.
Then, you should be prompted into a shell:
triton-server@host-name:/opt/nvidia/a2f_pipeline$
Inside the container, start the NIM server by running:
$ /opt/nim/start_server.sh &
The & operator starts the server as a background process, enabling you to run additional commands within the container.
If you do not immediately return to the shell prompt, press Enter
to regain access and continue executing commands.
The commands below are run inside the container unless stated otherwise.
Generating the TRT engine#
First step is to generate the TRT engine for the AI model (specific to the GPU used by your machine) with the provided python app looking like this:
usage: generate_trt_models.py [-h] [--stylization-config STYLIZATION_CONFIG] [--advanced-config ADVANCED_CONFIG]
Generates TRT models for A2F Service.
options:
-h, --help show this help message and exit
--stylization-config STYLIZATION_CONFIG
file path to the stylization config
--advanced-config ADVANCED_CONFIG
file path to the advanced config
If you want to stick to these default values you don’t need to specify anything.
Note
You can back up the generated TRT engines to skip model generation on NIM startup but be aware that every model is
specific for each GPU. The generated model is located in the /tmp/a2x
directory inside the Docker container.
Generate the Audio2Emotion and Audio2Face TRT engines with default configs:
$ ./service/generate_trt_models.py
This TRT engine will need to be regenerated when deployment environment changed. This is especially the case when GPU changes are present, with a different architecture or compute capability. The generated TRT engine can potentially be reused on machines with the exact same controlled configuration (same hardware + docker). It is recommended to always regenerate the TRT engine whenever hardware changes are made.
Starting the service#
Second step is to start the service. The Audio2Face-3D Service help menu looks like this:
$ a2f_pipeline.run -h
Usage: a2f_pipeline.run [--help] [--version] [--stylization-config] [--deployment-config] [--advanced-config]
Optional arguments:
-h, --help shows help message and exits
-v, --version prints version information and exits
--stylization-config file path to the stylization config
--deployment-config file path to the deployment config
--advanced-config file path to the advanced config
To use the default configuration you can just run inside the container:
$ /usr/local/bin/a2f_pipeline.run
You should see a log like this one signalling proper start of the A2F-3D service.
[2024-04-23 12:44:33.066] [ global ] [info] Running...
Note
When you start the service, you might encounter warnings labeled as GStreamer-WARNING. These warnings occur because some libraries are missing from the container. However, they are safe to ignore, as these libraries are not used by Audio2Face-3D.
Changing Configuration - The Shortest Way#
The commands below are run inside the container.
Assuming you decide to use the claire model you can run the following command:
$ ./service/generate_trt_models.py --stylization-config /mnt/configs/claire_stylization_config.yaml \
--advanced-config /mnt/configs/advanced_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/claire_stylization_config.yaml \
--deployment-config /mnt/configs/deployment_config.yaml \
--advanced-config /mnt/configs/advanced_config.yaml
Warning
The current ./service/generate_trt_models.py doesn’t support cache invalidation. If you update the configuration file and want to regenerate the model, you need to remove the corresponding TRT model in the cache folder located in /tmp/a2x/
Then you will have a container running with the custom provided parameters.
Changing Configuration - The Flexible Way#
The way that specifying a configuration file works is by overriding values. Which means you don’t have to specify default values in your configuration files. By consequence your configuration file only needs to contain a subset of the default configuration file.
Moreover for a2f-3d section of the stylization configuration; specifying a inference_model_id will automatically load the default face parameters matching that id; and specifying a blendshape_id will automatically load the default blendshape parameters.
An example to illustrate that should make thing very clear:
Example 1: Setting the Stylization config to use Mark#
On the host machine, create a file called short_mark_stylization_config.yaml in $LOCAL_CONFIGS directory and add the following lines:
a2f:
inference_model_id: mark_v2.3
blendshape_id: mark_topo1_v2.1
Then, inside the container, run:
$ ./service/generate_trt_models.py --stylization-config /mnt/configs/short_mark_stylization_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/short_mark_stylization_config.yaml
Warning
The current ./service/generate_trt_models.py doesn’t support cache invalidation. If you update the configuration file and want to regenerate the model, you need to remove the corresponding TRT model in the cache folder located in /tmp/a2x/
This command has exactly the same effect as providing the full default Mark configuration file. The reason is because under the hood; inference_model_id and blendshape_id are used to load these defaults.
Example 2: Updating the type of Endpoint to Unidirectional#
Here we will talk about settings part of deployment_config.yaml.
On the host machine, create a file called unidirectional_deployment_config.yaml in $LOCAL_CONFIGS directory and add the following lines:
endpoints:
use_bidirectional: false
Then, inside the container, run:
$ ./service/generate_trt_models.py
$ a2f_pipeline.run --deployment-config /mnt/configs/unidirectional_deployment_config.yaml
This overrides the endpoint type from bidirectional to unidirectional.
This approach works for any key of the yaml files provided.
Warning
Make sure to use the option matching your configuration file.
--stylization-config # for the <any>_stylization_config.yaml
--deployment-config # for the deployment_config.yaml
--advanced-config # for the advanced_config.yaml
Advanced Stylization#
The above stylization configuration blendshape tuning was simplified for new users.
For advanced users, a section is available below.
Advanced Blendshape tuning
3 more parameters can be set for blendshape tuning:
active_poses: Which Blendshapes should be active. 1 for active; 0 for inactive
cancel_poses: Which Blendshape cancel each other; matching number indicate which one matches which; -1 noop
symmetry_poses: Which Blendshape is symmetric to another one; matching number indicate which one matches which; -1 noop
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: claire_v2.3
blendshape_id: claire_topo1_v2.1
face_params:
eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses:
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses:
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses:
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: james_v2.3
blendshape_id: james_topo2_v2.2
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses:
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses:
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses:
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: mark_v2.3
blendshape_id: mark_topo1_v2.1
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.3 # Controls the magnitude of the input audio
lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
skin_strength: 1.1 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 1.0
EyeLookInLeft: 1.0
EyeLookOutLeft: 1.0
EyeLookUpLeft: 1.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 1.0
EyeLookInRight: 1.0
EyeLookOutRight: 1.0
EyeLookUpRight: 1.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 1.0
JawLeft: 1.0
JawRight: 1.0
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 1.0
MouthRight: 1.0
MouthSmileLeft: 1.0
MouthSmileRight: 1.0
MouthFrownLeft: 1.0
MouthFrownRight: 1.0
MouthDimpleLeft: 1.0
MouthDimpleRight: 1.0
MouthStretchLeft: 1.0
MouthStretchRight: 1.0
MouthRollLower: 1.0
MouthRollUpper: 1.0
MouthShrugLower: 1.0
MouthShrugUpper: 1.0
MouthPressLeft: 1.0
MouthPressRight: 1.0
MouthLowerDownLeft: 1.0
MouthLowerDownRight: 1.0
MouthUpperUpLeft: 1.0
MouthUpperUpRight: 1.0
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 1.0
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 1.0
NoseSneerRight: 1.0
TongueOut: 1.0
weight_offsets:
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses:
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses:
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses:
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
Configuration files for Unreal Engine Metahuman#
If you plan to connect A2F-3D with MetaHuman characters then you will need to use configuration files adapted for them. The only changes for these configuration files compared to the default configuration files are the blendshape multipliers and offsets
MetaHuman Stylization Configuration Files
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: claire_v2.3
blendshape_id: claire_topo1_v2.1
face_params:
eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 0.7
JawLeft: 0.2
JawRight: 0.2
JawOpen: 1.0
MouthClose: 1.0
MouthFunnel: 1.2
MouthPucker: 1.2
MouthLeft: 0.2
MouthRight: 0.2
MouthSmileLeft: 0.8
MouthSmileRight: 0.8
MouthFrownLeft: 0.4
MouthFrownRight: 0.4
MouthDimpleLeft: 0.7
MouthDimpleRight: 0.7
MouthStretchLeft: 0.1
MouthStretchRight: 0.1
MouthRollLower: 0.9
MouthRollUpper: 0.5
MouthShrugLower: 0.9
MouthShrugUpper: 0.4
MouthPressLeft: 0.8
MouthPressRight: 0.8
MouthLowerDownLeft: 0.8
MouthLowerDownRight: 0.8
MouthUpperUpLeft: 0.8
MouthUpperUpRight: 0.8
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 0.2
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 0.8
NoseSneerRight: 0.8
TongueOut: 0.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses: # Define which poses are active and which one are not
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses: # Define which poses cancel each other
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses: # Define which poses are symmetric to each other
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: james_v2.3
blendshape_id: james_topo2_v2.2
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.0 # Controls the magnitude of the input audio
lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
skin_strength: 1.0 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 0.7
JawLeft: 0.2
JawRight: 0.2
JawOpen: 0.8
MouthClose: 0.3
MouthFunnel: 1.0
MouthPucker: 1.0
MouthLeft: 0.2
MouthRight: 0.2
MouthSmileLeft: 1.2
MouthSmileRight: 1.2
MouthFrownLeft: 0.5
MouthFrownRight: 0.5
MouthDimpleLeft: 0.8
MouthDimpleRight: 0.8
MouthStretchLeft: 0.05
MouthStretchRight: 0.05
MouthRollLower: 0.8
MouthRollUpper: 0.5
MouthShrugLower: 1.0
MouthShrugUpper: 0.4
MouthPressLeft: 0.8
MouthPressRight: 0.8
MouthLowerDownLeft: 0.8
MouthLowerDownRight: 0.8
MouthUpperUpLeft: 0.8
MouthUpperUpRight: 0.8
BrowDownLeft: 1.2
BrowDownRight: 1.2
BrowInnerUp: 1.3
BrowOuterUpLeft: 0.8
BrowOuterUpRight: 0.8
CheekPuff: 0.2
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 0.8
NoseSneerRight: 0.8
TongueOut: 0.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses: # Define which poses are active and which one are not
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses: # Define which poses cancel each other
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses: # Define which poses are symmetric to each other
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
amazement: 0.0
anger: 0.0
cheekiness: 0.0
disgust: 0.0
fear: 0.0
grief: 0.0
joy: 0.0
outofbreath: 0.0
pain: 0.0
sadness: 0.0
a2e:
enabled: true
live_transition_time: 0.5
post_processing_params:
emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions
a2f:
# A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
inference_model_id: mark_v2.3
blendshape_id: mark_topo1_v2.1
face_params:
eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
input_strength: 1.3 # Controls the magnitude of the input audio
lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
skin_strength: 1.1 # Controls the range of motion of the skin
upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face
blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
enable_clamping_bs_weight: false
weight_multipliers:
EyeBlinkLeft: 1.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 1.0
EyeWideLeft: 1.0
EyeBlinkRight: 1.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 1.0
EyeWideRight: 1.0
JawForward: 0.7
JawLeft: 0.2
JawRight: 0.2
JawOpen: 1.0
MouthClose: 0.2
MouthFunnel: 1.2
MouthPucker: 1.2
MouthLeft: 0.2
MouthRight: 0.2
MouthSmileLeft: 0.8
MouthSmileRight: 0.8
MouthFrownLeft: 0.5
MouthFrownRight: 0.5
MouthDimpleLeft: 0.8
MouthDimpleRight: 0.8
MouthStretchLeft: 0.05
MouthStretchRight: 0.05
MouthRollLower: 0.8
MouthRollUpper: 0.5
MouthShrugLower: 0.9
MouthShrugUpper: 0.4
MouthPressLeft: 0.8
MouthPressRight: 0.8
MouthLowerDownLeft: 0.8
MouthLowerDownRight: 0.8
MouthUpperUpLeft: 0.8
MouthUpperUpRight: 0.8
BrowDownLeft: 1.0
BrowDownRight: 1.0
BrowInnerUp: 1.0
BrowOuterUpLeft: 1.0
BrowOuterUpRight: 1.0
CheekPuff: 0.2
CheekSquintLeft: 1.0
CheekSquintRight: 1.0
NoseSneerLeft: 0.8
NoseSneerRight: 0.8
TongueOut: 0.0
weight_offsets: # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
EyeBlinkLeft: 0.0
EyeLookDownLeft: 0.0
EyeLookInLeft: 0.0
EyeLookOutLeft: 0.0
EyeLookUpLeft: 0.0
EyeSquintLeft: 0.0
EyeWideLeft: 0.0
EyeBlinkRight: 0.0
EyeLookDownRight: 0.0
EyeLookInRight: 0.0
EyeLookOutRight: 0.0
EyeLookUpRight: 0.0
EyeSquintRight: 0.0
EyeWideRight: 0.0
JawForward: 0.0
JawLeft: 0.0
JawRight: 0.0
JawOpen: 0.0
MouthClose: 0.0
MouthFunnel: 0.0
MouthPucker: 0.0
MouthLeft: 0.0
MouthRight: 0.0
MouthSmileLeft: 0.0
MouthSmileRight: 0.0
MouthFrownLeft: 0.0
MouthFrownRight: 0.0
MouthDimpleLeft: 0.0
MouthDimpleRight: 0.0
MouthStretchLeft: 0.0
MouthStretchRight: 0.0
MouthRollLower: 0.0
MouthRollUpper: 0.0
MouthShrugLower: 0.0
MouthShrugUpper: 0.0
MouthPressLeft: 0.0
MouthPressRight: 0.0
MouthLowerDownLeft: 0.0
MouthLowerDownRight: 0.0
MouthUpperUpLeft: 0.0
MouthUpperUpRight: 0.0
BrowDownLeft: 0.0
BrowDownRight: 0.0
BrowInnerUp: 0.0
BrowOuterUpLeft: 0.0
BrowOuterUpRight: 0.0
CheekPuff: 0.0
CheekSquintLeft: 0.0
CheekSquintRight: 0.0
NoseSneerLeft: 0.0
NoseSneerRight: 0.0
TongueOut: 0.0
active_poses: # Define which poses are active and which one are not
EyeBlinkLeft: 1
EyeLookDownLeft: 0
EyeLookInLeft: 0
EyeLookOutLeft: 0
EyeLookUpLeft: 0
EyeSquintLeft: 1
EyeWideLeft: 1
EyeBlinkRight: 1
EyeLookDownRight: 0
EyeLookInRight: 0
EyeLookOutRight: 0
EyeLookUpRight: 0
EyeSquintRight: 1
EyeWideRight: 1
JawForward: 1
JawLeft: 1
JawRight: 1
JawOpen: 1
MouthClose: 1
MouthFunnel: 1
MouthPucker: 1
MouthLeft: 1
MouthRight: 1
MouthSmileLeft: 1
MouthSmileRight: 1
MouthFrownLeft: 1
MouthFrownRight: 1
MouthDimpleLeft: 1
MouthDimpleRight: 1
MouthStretchLeft: 1
MouthStretchRight: 1
MouthRollLower: 1
MouthRollUpper: 1
MouthShrugLower: 1
MouthShrugUpper: 1
MouthPressLeft: 1
MouthPressRight: 1
MouthLowerDownLeft: 1
MouthLowerDownRight: 1
MouthUpperUpLeft: 1
MouthUpperUpRight: 1
BrowDownLeft: 1
BrowDownRight: 1
BrowInnerUp: 1
BrowOuterUpLeft: 1
BrowOuterUpRight: 1
CheekPuff: 1
CheekSquintLeft: 1
CheekSquintRight: 1
NoseSneerLeft: 1
NoseSneerRight: 1
TongueOut: 0
cancel_poses: # Define which poses cancel each other
EyeBlinkLeft: -1
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: -1
EyeBlinkRight: -1
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: -1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: -1
MouthSmileRight: -1
MouthFrownLeft: -1
MouthFrownRight: -1
MouthDimpleLeft: -1
MouthDimpleRight: -1
MouthStretchLeft: -1
MouthStretchRight: -1
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: -1
MouthPressRight: -1
MouthLowerDownLeft: -1
MouthLowerDownRight: -1
MouthUpperUpLeft: -1
MouthUpperUpRight: -1
BrowDownLeft: -1
BrowDownRight: -1
BrowInnerUp: -1
BrowOuterUpLeft: -1
BrowOuterUpRight: -1
CheekPuff: -1
CheekSquintLeft: -1
CheekSquintRight: -1
NoseSneerLeft: -1
NoseSneerRight: -1
TongueOut: -1
symmetry_poses: # Define which poses are symmetric to each other
EyeBlinkLeft: 0
EyeLookDownLeft: -1
EyeLookInLeft: -1
EyeLookOutLeft: -1
EyeLookUpLeft: -1
EyeSquintLeft: -1
EyeWideLeft: 1
EyeBlinkRight: 0
EyeLookDownRight: -1
EyeLookInRight: -1
EyeLookOutRight: -1
EyeLookUpRight: -1
EyeSquintRight: -1
EyeWideRight: 1
JawForward: -1
JawLeft: -1
JawRight: -1
JawOpen: -1
MouthClose: -1
MouthFunnel: -1
MouthPucker: -1
MouthLeft: -1
MouthRight: -1
MouthSmileLeft: 2
MouthSmileRight: 2
MouthFrownLeft: 3
MouthFrownRight: 3
MouthDimpleLeft: 4
MouthDimpleRight: 4
MouthStretchLeft: 5
MouthStretchRight: 5
MouthRollLower: -1
MouthRollUpper: -1
MouthShrugLower: -1
MouthShrugUpper: -1
MouthPressLeft: 6
MouthPressRight: 6
MouthLowerDownLeft: 7
MouthLowerDownRight: 7
MouthUpperUpLeft: 8
MouthUpperUpRight: 8
BrowDownLeft: 10
BrowDownRight: 10
BrowInnerUp: -1
BrowOuterUpLeft: 9
BrowOuterUpRight: 9
CheekPuff: -1
CheekSquintLeft: 11
CheekSquintRight: 11
NoseSneerLeft: 12
NoseSneerRight: 12
TongueOut: -1
Parameter Tuning Guide#
Audio2Face-3D imports inference parameters from multiple sources: the inference model SDK, configuration files at deployment-time, and runtime input. Generally, parameters at deployment time override those matching in the model files, while runtime parameters override both deployment-time and model default parameters.
For runtime parameters please see AudioStreamHeader and FaceParameters, BlendShapeParameters, EmotionParameters, EmotionPostProcessingParameters for proto definitions.
FaceParameters
Only a subset of FaceParameters
is supported for runtime tuning.
See FaceParameters for the list of supported ones.
Emotion Post-processing Parameters
Audio2Emotion SDK automatically parses emotions from the incoming audio and generates emotion vectors to drive the character’s facial animation performance. Use the post processing parameters below to further tailor the performance to your desired specifications. Note that the order of operations listed below is the specific sequence in which the processes are executed in the technology stack.
Emotion Contrast
Emotion contrast is applied to the inference output, controlling the emotion spread using the sigmoid function. This adjustment pushes the higher and lower values, allowing for a wider range in the generated emotional performance.
Max Emotions
Max emotions allows the user to set a hard limit on the number of emotions that Audio2Emotion SDK will engage. Emotions are prioritized by their strength. Once the maximum number of emotions is reached, only vectors for these prioritized emotions will be engaged, and all other emotions will be null. This helps achieve a more accurate read on the correct emotion when the vocal emotional performance is more subtle
For example - if Joy and Amazement are the strongest predicted emotions, and you set the Max Emotions limit to 2, only Joy and Amazement will be applied to the performance.
Emotion index conversion
Emotion index conversion uses emotion correspondence to remap emotions from Audio2Emotion to Audio2Face SDKs.
Smoothing
Uses a live blend coefficient to do an exponential smoothing on the remapped emotions.
Blend Preferred Emotion
The preferred emotion (manual emotion) and the inference emotion output are combined to generate a composite final output of all emotion data.
Transition smoothing
Transition smoothing applies an exponential smoothing to the final emotion values. (the composite of Audio2Emotion + preferred emotion)
Emotion Strength
This controls the overall emotion strength of the final emotion composite from the previous emotion processes. A multiplier to the final emotion result. (Audio2Emotion + preferred)
Preferred Emotion
Use the emotion sliders to create a preferred (manual) emotion pose as the base emotion for the character animation. The preferred emotion is taken from the current settings in the Emotion widget and is blended with the generated emotions throughout the animation.
Blendshape parameters
Currently, the default blendshape parameters included in the model data are tuned for use with Metahuman avatars.
For our default avatars (Claire, Mark, Ben), all 52 values of weight_multipliers
in the stylization config should be set to 1.0.
Environment variables#
The following table describes the environment variables that can be passed to Audio2Face-3D NIM as a -e
argument
added to a docker run command:
Variable |
Required |
Values |
Notes |
---|---|---|---|
NGC_API_KEY |
No |
Any string representing a valid NGC API Key |
Required only if you want to download TRT engines from NGC. You must set this variable to the value of your personal NGC API key. |
NIM_LOGGING_JSONL |
No |
true / false |
Enables (true) or disables (false) JSON Lines format logging to stdout. |
NIM_MANIFEST_PROFILE |
No |
Any valid manifest profile string |
Choose the manifest profile id from Supported Models for your GPU. |
NIM_DISABLE_MODEL_DOWNLOAD |
No |
true / false |
Disables (true) or enables (false) automatic TRT engine downloads from NGC. |
NIM_SKIP_A2F_START |
No |
true / false |
If set to true, the container will not start the A2F-3D service at startup. |
Volumes#
The following table describes the paths inside the container into which the local paths can be mounted. For example, you
can mount a volume with the following docker flag -v {LOCAL_PATH}:{PATH_IN_CONTAINER}
.
Container path |
Required |
Notes |
---|---|---|
/tmp/a2x/ |
Not required, but if this volume is not mounted, the container will have to do a fresh download or generation of the model each time it is brought up |
Path for AI models. Must have execute, read and write permissions or 777. |
/mnt/configs/ |
Needed only in the case where you want to override some configuration parameters |
Path for files to override configs |
Quick Deployment of Audio2Face-3D Microservices#
Instead of deploying the Audio2Face-3D and manually starting the model, you can quickly deploy them together using the docker-compose file following the quick-start instructions provided in the NVIDIA Audio2Face-3D Samples repo.