Parameter Tuning Guide
Inference Parameters
Audio2Face imports inference parameters from multiple sources: the inference model SDK, deployment configs, and runtime input. Generally, deployment parameters override those matching in the model JSON files, while runtime parameters override both deployment and model default parameters.
Model default parameters
The Audio2Face inference SDK defines a set of parameters for inference and engine initialization in each model’s JSON file. It is recommended not to change these parameters.
Deployment parameters
Deployment parameters override the model’s default parameters and remain in effect unless temporarily superseded by
runtime parameters provided with each audio input. Parameters are configured in values.yaml
for Kubernetes deployment,
or in a2f_config.yaml
/ ac_a2f_config.yaml
for container deployment.
Face parameters
Users can configure the full set of face parameters for the Audio2Face cluster by setting the faceParams
field with
a formatted string, as shown below.
'''
{
"face_params": {
"input_strength": 1.0,
"prediction_delay": 0.15,
"upper_face_smoothing": 0.001,
"lower_face_smoothing": 0.006,
"upper_face_strength": 1.0,
"lower_face_strength": 1.25,
"face_mask_level": 0.6,
"face_mask_softness": 0.0085,
"emotion": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
"source_shot": "cp1_neutral",
"source_frame": 10,
"skin_strength": 1.0,
"blink_strength": 1.0,
"lower_teeth_strength": 1.25,
"lower_teeth_height_offset": 0.0,
"lower_teeth_depth_offset": 0.0,
"lip_close_offset": 0.0,
"tongue_strength": 1.3,
"tongue_height_offset": 0.0,
"tongue_depth_offset": 0.0,
"eyeballs_strength": 1.0,
"saccade_strength": 0.6,
"right_eye_rot_x_offset": 0.0,
"right_eye_rot_y_offset": 0.0,
"left_eye_rot_x_offset": 0.0,
"left_eye_rot_y_offset": 0.0,
"eyelid_offset": 0.0,
"blink_interval": 3.0,
"eye_saccade_seed": 0,
"keyframer_fps": 60.0,
"a2e_window_size": 1.4,
"a2e_stride": 1.0,
"a2e_emotion_strength": 0.6,
"a2e_smoothing_kernel_radius": 0,
"a2e_max_emotions": 6,
"a2e_contrast": 1.0,
"a2e_preferred_emotion_strength": 0.5,
"a2e_auto_generate": false,
"a2e_force_set_keys": false
}
}
'''
Emotion parameters
The following are the configurable emotion post-processing parameters, explained in the next section.
a2eEmotionContrast: "1.0"
a2eLiveBlendCoef: "0.0"
a2eEnablePreferredEmotion: "False"
a2ePreferredEmotionStrength: "0.0"
a2eEmotionStrength: "1.0"
a2eMaxEmotions: "6"
Blendshape parameters
Blendshape weight multipliers for the cluster can be configured by setting the bsWeightMultipliers
field with
52 floating-point numbers, as shown below.
bsWeightMultipliers: [1.0, 1.0, …, 1.0, 1.0]
Runtime parameters
These parameters can be included in the audio stream header to override the existing parameters. See AudioStreamHeader and FaceParameters, BlendShapeParameters, EmotionPostProcessingParameters for proto definitions.
Note
Only the float_params
field is used for FaceParameters
.
Note
Only a subset of FaceParameters
is supported for runtime tuning.
See FaceParameters for the list of supported ones.
Emotion Post-processing Parameters
Audio2Emotion automatically parses emotions from the incoming audio and generates emotion vectors to drive the character’s facial animation performance. Use the Post Processing parameters below to further tailor the performance to your desired specifications. Note that the order of operations listed below is the specific sequence in which the processes are executed in the technology stack.
Emotion Contrast
Emotion contrast is then applied to the inference output, controlling the emotion spread using the sigmoid function. This adjustment pushes the higher and lower values, allowing for a wider range in the generated emotional performance.
Max Emotions
Max emotions allows the user to set a hard limit on the number of emotions that Audio2Emotion will engage. Emotions are prioritized by their strength. Once the maximum number of emotions is reached, only vectors for these prioritized emotions will be engaged, and all other emotions will be null. This helps achieve a more accurate read on the correct emotion when the vocal emotional performance is more subtle
For example - if Joy and Amazement are the strongest predicted emotions, and you set the Max Emotions limit to 2, only Joy and Amazement will be applied to the performance.
Emotion index conversion
Emotion index conversion uses emotion correspondence to remap emotions from Audio2Emotion to Audio2Face.
Smoothing
Uses a live blend coefficient to do an exponential smoothing on the remapped emotions.
Blend Preferred Emotion
The preferred emotion (manual emotion) and the inference emotion output are combined to generate a composite final output of all emotion data.
Transition smoothing
Transition smoothing applies an exponential smoothing to the final emotion values. (the composite of a2e + preferred emotion)
Emotion Strength
This controls the overall emotion strength of the final emotion composite from the previous emotion processes. A multiplier to the final emotion result. (a2e + preferred)
Preferred Emotion
Use the emotion sliders to create a preferred (manual) emotion pose as the base emotion for the character animation. The preferred emotion is taken from the current settings in the Emotion widget and is blended with the generated emotions throughout the animation.
Notes
Blendshape parameters
Currently, the default blendshape parameters included in the model data are tuned for use with Metahuman avatars.
For our default avatars (Claire, Mark, Ben), all 52 values of bsWeightMultipliers
should be set to 1.0.