Parameter Tuning Guide

Inference Parameters

Audio2Face imports inference parameters from multiple sources: the inference model SDK, deployment configs, and runtime input. Generally, deployment parameters override those matching in the model JSON files, while runtime parameters override both deployment and model default parameters.

Model default parameters

The Audio2Face inference SDK defines a set of parameters for inference and engine initialization in each model’s JSON file. It is recommended not to change these parameters.

Deployment parameters

Deployment parameters override the model’s default parameters and remain in effect unless temporarily superseded by runtime parameters provided with each audio input. Parameters are configured in values.yaml for Kubernetes deployment, or in a2f_config.yaml / ac_a2f_config.yaml for container deployment.

Face parameters

Users can configure the full set of face parameters for the Audio2Face cluster by setting the faceParams field with a formatted string, as shown below.

'''
{
"face_params": {
    "input_strength": 1.0,
    "prediction_delay": 0.15,
    "upper_face_smoothing": 0.001,
    "lower_face_smoothing": 0.006,
    "upper_face_strength": 1.0,
    "lower_face_strength": 1.25,
    "face_mask_level": 0.6,
    "face_mask_softness": 0.0085,
    "emotion": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    "source_shot": "cp1_neutral",
    "source_frame": 10,
    "skin_strength": 1.0,
    "blink_strength": 1.0,
    "lower_teeth_strength": 1.25,
    "lower_teeth_height_offset": 0.0,
    "lower_teeth_depth_offset": 0.0,
    "lip_close_offset": 0.0,
    "tongue_strength": 1.3,
    "tongue_height_offset": 0.0,
    "tongue_depth_offset": 0.0,
    "eyeballs_strength": 1.0,
    "saccade_strength": 0.6,
    "right_eye_rot_x_offset": 0.0,
    "right_eye_rot_y_offset": 0.0,
    "left_eye_rot_x_offset": 0.0,
    "left_eye_rot_y_offset": 0.0,
    "eyelid_offset": 0.0,
    "blink_interval": 3.0,
    "eye_saccade_seed": 0,
    "keyframer_fps": 60.0,
    "a2e_window_size": 1.4,
    "a2e_stride": 1.0,
    "a2e_emotion_strength": 0.6,
    "a2e_smoothing_kernel_radius": 0,
    "a2e_max_emotions": 6,
    "a2e_contrast": 1.0,
    "a2e_preferred_emotion_strength": 0.5,
    "a2e_auto_generate": false,
    "a2e_force_set_keys": false
    }
}
'''

Emotion parameters

The following are the configurable emotion post-processing parameters, explained in the next section.

a2eEmotionContrast: "1.0"
a2eLiveBlendCoef: "0.0"
a2eEnablePreferredEmotion: "False"
a2ePreferredEmotionStrength: "0.0"
a2eEmotionStrength: "1.0"
a2eMaxEmotions: "6"

Blendshape parameters

Blendshape weight multipliers for the cluster can be configured by setting the bsWeightMultipliers field with 52 floating-point numbers, as shown below.

bsWeightMultipliers: [1.0, 1.0, …, 1.0, 1.0]

Runtime parameters

These parameters can be included in the audio stream header to override the existing parameters. See AudioStreamHeader and FaceParameters, BlendShapeParameters, EmotionPostProcessingParameters for proto definitions.

Note

Only the float_params field is used for FaceParameters.

Note

Only a subset of FaceParameters is supported for runtime tuning. See FaceParameters for the list of supported ones.

Emotion Post-processing Parameters

Audio2Emotion automatically parses emotions from the incoming audio and generates emotion vectors to drive the character’s facial animation performance. Use the Post Processing parameters below to further tailor the performance to your desired specifications. Note that the order of operations listed below is the specific sequence in which the processes are executed in the technology stack.

Emotion Contrast

Emotion contrast is then applied to the inference output, controlling the emotion spread using the sigmoid function. This adjustment pushes the higher and lower values, allowing for a wider range in the generated emotional performance.

Max Emotions

Max emotions allows the user to set a hard limit on the number of emotions that Audio2Emotion will engage. Emotions are prioritized by their strength. Once the maximum number of emotions is reached, only vectors for these prioritized emotions will be engaged, and all other emotions will be null. This helps achieve a more accurate read on the correct emotion when the vocal emotional performance is more subtle

For example - if Joy and Amazement are the strongest predicted emotions, and you set the Max Emotions limit to 2, only Joy and Amazement will be applied to the performance.

Emotion index conversion

Emotion index conversion uses emotion correspondence to remap emotions from Audio2Emotion to Audio2Face.

Smoothing

Uses a live blend coefficient to do an exponential smoothing on the remapped emotions.

Blend Preferred Emotion

The preferred emotion (manual emotion) and the inference emotion output are combined to generate a composite final output of all emotion data.

Transition smoothing

Transition smoothing applies an exponential smoothing to the final emotion values. (the composite of a2e + preferred emotion)

Emotion Strength

This controls the overall emotion strength of the final emotion composite from the previous emotion processes. A multiplier to the final emotion result. (a2e + preferred)

Preferred Emotion

Use the emotion sliders to create a preferred (manual) emotion pose as the base emotion for the character animation. The preferred emotion is taken from the current settings in the Emotion widget and is blended with the generated emotions throughout the animation.

Notes

Blendshape parameters

Currently, the default blendshape parameters included in the model data are tuned for use with Metahuman avatars. For our default avatars (Claire, Mark, Ben), all 52 values of bsWeightMultipliers should be set to 1.0.