A2F-3D NIM Manual Container Deployment and Configuration#

We offer a Docker container via the NGC registry for deployment purposes. This guide will demonstrate how to deploy, configure and run the Docker image for Audio2Face-3D NIM.

Before proceeding, it is essential to familiarize yourself with the concepts, services, and requirements necessary to run Audio2Face-3D by reading the Architecture Overview page.

Audio2Face-3D is highly configurable through configuration files and environment variables. To configure Audio2Face-3D you will need to use a custom entrypoint.

Prerequisites#

In order to run the microservice you will need access to the NGC Docker registry.

Make sure you have an NVAIE Access, your Personal Key and that you are logged in to the nvcr.io registry.

You will also need the NVIDIA Container Toolkit configured with Docker.

For more information about hardware and software requirements, visit the Support Matrix page.

Configuration files#

There are 3 kinds for A2F-3D configuration files. Each of these configuration files corresponds to a specific type of user.

  1. An Artist: The stylization config contains configuration parameters specific to only what an Artist would tweak.

  2. A Devops: The deployment config contains configuration parameters specific to only what a Devops would need to think about.

  3. An Advanced User: The rest of the configuration parameters are more rarely updated. But it is necessary to have them for specific scenario.

Warning

These configuration files are deployment-time configuration files. Although they look similar to the runtime ones, runtime and deployment-time configuration files should not be mistaken. There is a case difference between the 2 configuration files (snake_case VS camelCase) and structure difference. Here is an example of runtime configuration: config_james.yml.

1. The Stylization configuration file#

There are 3 variants of this configuration file:

  1. Claire

  2. James

  3. Mark

They each correspond to a specific AI Model and contain the default values that are going to be used. By default James’s configuration is used by the Microservice.

Claire Config#

claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip, and it also defines the default preferred emotion.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true # Enable audio2emotion, ai-generated audio-driven emotion
  live_transition_time: 0.5 # Controls the smoothness of the output transition toward the target value across frames; higher values result in smoother transitions. Each frame updates at a rate of <frame time length> / <live transition time> (capped at 1.0) toward the raw result.
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: claire_v2.3
  blendshape_id: claire_topo1_v2.1

  face_params:
    eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    # Multiplier for each blendshape output. This list depends on the blendshape model.
    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    # Constant offset for each blendshape output. This list depends on the blendshape model.
    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

James Config#

james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: james_v2.3
  blendshape_id: james_topo2_v2.2

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

Mark Config#

mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: mark_v2.3
  blendshape_id: mark_topo1_v2.1

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.3 # Controls the magnitude of the input audio
    lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.1 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

2. The Deployment configuration file#

deployment_config.yaml
common:
  # Number of stream to use simultaneously
  # The recommended value depends on the gpu and your latency constraints
  # Higher value means: more concurrent users and higher overall throughput
  # Lower value means: less concurrent users, higher throughput per stream, lower latencies
  stream_number: 10

  # Pad each audio file with some 1.5 seconds of silent audio
  add_silence_padding_after_audio: false

logging:
  # Level of log wanted, info is recommended
  # Can be one of:
  # => trace
  # => debug
  # => info
  # => warn
  # => err
  # => critical
  # => off
  log_level: info
  # How often should FPS logs be printed per stream
  fps_logging_interval_second: 1

endpoints:
  # use the bidirectional endpoint instead of 2 connections (server to receive audio + client to send animation data)
  use_bidirectional: true

  # server to perform the bidirectional streaming connection
  # Used only if use_bidirectional_endpoint==true
  bidirectional:
    server:
      # port to open
      url: 0.0.0.0:52000

  unidirectional:
    # Server that receives the audio
    # Used only if use_bidirectional_endpoint==false
    server:
      # port to open
      url: 0.0.0.0:50000

    # Client that sends the animation data
    # Used only if use_bidirectional_endpoint==false
    client:
      # url of the server to contact
      url: 0.0.0.0:51000

# Configs specific to telemetry
telemetry:
  # Name of the service
  service_name: audio2face
  # Whether to enable metrics
  metrics_enabled: false
  # Whether to enable traces
  traces_enabled: false
  # Can be prometheus or otlp
  metrics_exporter: prometheus
  # Export interval in milliseconds
  otel_metric_export_interval: 60000
  # Export timeout in milliseconds
  otel_metric_export_timeout: 30000

  otlp_http_metrics_endpoint: http://localhost:4318/v1/metrics
  
  otlp_http_traces_endpoint: http://localhost:4318/v1/traces
  
  prometheus_endpoint: 0.0.0.0:9464

3. The Advanced configuration file#

advanced_config.yaml
input_sanitization:
  # max size of UUID
  max_len_uuid: 50
  # Maximum samplerate
  max_sample_rate: 144000
  # Minimum samplerate
  min_sample_rate: 16000
  # Maximum amount in second for the processing time
  # After this timeout the connection to A2F will be cut
  max_processing_duration_second: 300
  # Maximum size of 1 audio buffer sent over the grpc stream
  max_audio_buffer_size_second: 10
  # Maximum size of the audio clip to process
  max_audio_clip_size_second: 300
  # Maximum amount of time that A2F Controller will wait when not
  # receiving data from A2F, before cutting the connection
  max_wait_time_idle_ms: 30000
  # Will stop serving a user if their fps a lower than low_fps
  # for more than low_fps_max_duration_second seconds
  # For real time application less than 30 FPS means slower than realtime
  # So if users provide audio to the service at less than 30 FPS then
  # the interactive experience will stutter.
  low_fps: 29
  low_fps_max_duration_second: 7

garbage_collector:
  # enable or disable the garbage collector
  # This is only used with bidirectional connection where the service is holding data
  # waiting for the client to pick them up.
  enabled: true
  # how often the garbage collector should run
  interval_run_second: 10
  # If the garbage collector finds streams holding
  # more than N seconds of data, it will delete data
  # until the amount falls below this threshold.
  # Clients are expected to retrieve data promptly so that
  # the service doesn't retain the data excessively.
  max_size_stored_data_second: 60


pipeline_parameters:
  # Queues between pipeline components
  # Can be tweaked:
  # Higher values can lead to higher throughput but leads to higher latencies
  # Lower values leads to lower latencies; and potentially lower overall throughput
  # Leave these values to default in case of doubt
  queue_size_after_a2e: 1
  queue_size_after_a2f: 300
  queue_size_after_streammux: 1

  streammux:
    # Do not change this config; this is internal
    adaptive_batching: 0
    # Minimum FPS for all streams
    # Pipeline will not slow down under this value if:
    # * compute allows it
    # * upload speed of audio allows it
    # Here 40 FPS
    # Numerator for that config:
    overall_min_fps_n: 40
    # Denominator for that config:
    overall_min_fps_d: 1

a2f:
  # Remove temporal smoothing
  # used for debugging individual frames generated
  temporal_smoothing: true
  device_id: 0 # Which gpu id to use

a2e:
  inference_interval: 10
  device_id: 0 # Which gpu id to use


trt_model_generation:
  a2e:
    precision: "fp16"
    min_shape: 1
    optimal_shape: 10
    maximum_shape: 128
  a2f:
    precision: "fp16"
    min_shape: 1
    optimal_shape: 10
    maximum_shape: 128

The above configuration files represent the default values that are being used by the Microservice.

To apply your own configuration, start the A2F-3D NIM with a custom endpoint and mount your configuration files inside the container. Refer to the next section of the guide for detailed instructions.

How to use configuration files#

To override any of the configurations, you need to mount the files in a Docker volume in /mnt/configs path. For convenience, export an environment variable with the path where the overridden configurations are called LOCAL_CONFIGS. To do so, you can follow the example instructions below:

$ mkdir -p ~/.cache/audio2face-3d-configs
$ export LOCAL_CONFIGS=~/.cache/audio2face-3d-configs

You need to make a copy of the above configs and place them in, LOCAL_CONFIGS directory.

Then you will have:

$ ls $LOCAL_CONFIGS
advanced_config.yaml
claire_stylization_config.yaml
deployment_config.yaml
james_stylization_config.yaml
mark_stylization_config.yaml

Model caching#

You can cache the model locally so next time you run the service you don’t have to generate or download it. To cache the model, use a Docker volume mount. Make sure the local path has execute, read and write permissions (777 permissions). You can these instructions to setup a local path where to cache models with the right permissions:

$ mkdir -p ~/.cache/audio2face-3d
$ chmod 777 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d

When you run the second time, if you want to run the NIM entrypoint instead of a custom one, make sure to disable model caching by setting the environment variable NIM_DISABLE_MODEL_DOWNLOAD=true.

Starting the A2F-3D NIM with custom entrypoint#

$ docker run -it --rm --name audio2face-3d \
 --gpus all \
 --network=host \
 --entrypoint /bin/bash -w /opt/nvidia/a2f_pipeline \
 -e NIM_DISABLE_MODEL_DOWNLOAD=true \
 -e NIM_SKIP_A2F_START=true \
 -v "$LOCAL_NIM_CACHE:/tmp/a2x" \
 -v "$LOCAL_CONFIGS:/mnt/configs/" \
 nvcr.io/nim/nvidia/audio2face-3d:1.2

The above command will create a Docker container running the Audio2face-3D NIM. Notice the --gpus all is specified to bridge a GPU to the Docker container. You can customize this option with your preferences.

We also use --network=host to bind all ports on local network. If you want thin control over port binding, use the -p directive instead with the appropriate ports.

Notice the two volume mounts for caching the model -v "$LOCAL_NIM_CACHE:/tmp/a2x" and for overriding configurations -v "$LOCAL_CONFIGS:/mnt/configs/". Skip each volume mount if you don’t want to change the default configuration or cache the model respectively.

Then, you should be prompted into a shell:

triton-server@host-name:/opt/nvidia/a2f_pipeline$

Inside the container, start the NIM server by running:

$ /opt/nim/start_server.sh &

The & operator starts the server as a background process, enabling you to run additional commands within the container. If you do not immediately return to the shell prompt, press Enter to regain access and continue executing commands.

The commands below are run inside the container unless stated otherwise.

Generating the TRT engine#

First step is to generate the TRT engine for the AI model (specific to the GPU used by your machine) with the provided python app looking like this:

usage: generate_trt_models.py [-h] [--stylization-config STYLIZATION_CONFIG] [--advanced-config ADVANCED_CONFIG]

Generates TRT models for A2F Service.

options:
  -h, --help            show this help message and exit
  --stylization-config  STYLIZATION_CONFIG
                        file path to the stylization config
  --advanced-config     ADVANCED_CONFIG
                        file path to the advanced config

If you want to stick to these default values you don’t need to specify anything.

Note

You can back up the generated TRT engines to skip model generation on NIM startup but be aware that every model is specific for each GPU. The generated model is located in the /tmp/a2x directory inside the Docker container.

Generate the Audio2Emotion and Audio2Face TRT engines with default configs:

$ ./service/generate_trt_models.py

This TRT engine will need to be regenerated when deployment environment changed. This is especially the case when GPU changes are present, with a different architecture or compute capability. The generated TRT engine can potentially be reused on machines with the exact same controlled configuration (same hardware + docker). It is recommended to always regenerate the TRT engine whenever hardware changes are made.

Starting the service#

Second step is to start the service. The Audio2Face-3D Service help menu looks like this:

$ a2f_pipeline.run -h
Usage: a2f_pipeline.run [--help] [--version] [--stylization-config] [--deployment-config] [--advanced-config]

Optional arguments:
  -h, --help                     shows help message and exits
  -v, --version                  prints version information and exits
  --stylization-config           file path to the stylization config
  --deployment-config            file path to the deployment config
  --advanced-config              file path to the advanced config

To use the default configuration you can just run inside the container:

$ /usr/local/bin/a2f_pipeline.run

You should see a log like this one signalling proper start of the A2F-3D service.

[2024-04-23 12:44:33.066] [  global  ] [info] Running...

Note

When you start the service, you might encounter warnings labeled as GStreamer-WARNING. These warnings occur because some libraries are missing from the container. However, they are safe to ignore, as these libraries are not used by Audio2Face-3D.

Changing Configuration - The Shortest Way#

The commands below are run inside the container.

Assuming you decide to use the claire model you can run the following command:

$ ./service/generate_trt_models.py --stylization-config /mnt/configs/claire_stylization_config.yaml \
   --advanced-config /mnt/configs/advanced_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/claire_stylization_config.yaml \
                   --deployment-config /mnt/configs/deployment_config.yaml \
                   --advanced-config /mnt/configs/advanced_config.yaml

Warning

The current ./service/generate_trt_models.py doesn’t support cache invalidation. If you update the configuration file and want to regenerate the model, you need to remove the corresponding TRT model in the cache folder located in /tmp/a2x/

Then you will have a container running with the custom provided parameters.

Changing Configuration - The Flexible Way#

The way that specifying a configuration file works is by overriding values. Which means you don’t have to specify default values in your configuration files. By consequence your configuration file only needs to contain a subset of the default configuration file.

Moreover for a2f-3d section of the stylization configuration; specifying a inference_model_id will automatically load the default face parameters matching that id; and specifying a blendshape_id will automatically load the default blendshape parameters.

An example to illustrate that should make thing very clear:

Example 1: Setting the Stylization config to use Mark#

On the host machine, create a file called short_mark_stylization_config.yaml in $LOCAL_CONFIGS directory and add the following lines:

a2f:
  inference_model_id: mark_v2.3
  blendshape_id: mark_topo1_v2.1

Then, inside the container, run:

$ ./service/generate_trt_models.py --stylization-config  /mnt/configs/short_mark_stylization_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/short_mark_stylization_config.yaml

Warning

The current ./service/generate_trt_models.py doesn’t support cache invalidation. If you update the configuration file and want to regenerate the model, you need to remove the corresponding TRT model in the cache folder located in /tmp/a2x/

This command has exactly the same effect as providing the full default Mark configuration file. The reason is because under the hood; inference_model_id and blendshape_id are used to load these defaults.

Example 2: Updating the type of Endpoint to Unidirectional#

Here we will talk about settings part of deployment_config.yaml.

On the host machine, create a file called unidirectional_deployment_config.yaml in $LOCAL_CONFIGS directory and add the following lines:

endpoints:
  use_bidirectional: false

Then, inside the container, run:

$ ./service/generate_trt_models.py
$ a2f_pipeline.run --deployment-config /mnt/configs/unidirectional_deployment_config.yaml

This overrides the endpoint type from bidirectional to unidirectional.

This approach works for any key of the yaml files provided.

Warning

Make sure to use the option matching your configuration file.

--stylization-config # for the <any>_stylization_config.yaml
--deployment-config # for the deployment_config.yaml
--advanced-config # for the advanced_config.yaml

Advanced Stylization#

The above stylization configuration blendshape tuning was simplified for new users.

For advanced users, a section is available below.

Advanced Blendshape tuning

3 more parameters can be set for blendshape tuning:

  • active_poses: Which Blendshapes should be active. 1 for active; 0 for inactive

  • cancel_poses: Which Blendshape cancel each other; matching number indicate which one matches which; -1 noop

  • symmetry_poses: Which Blendshape is symmetric to another one; matching number indicate which one matches which; -1 noop

claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: claire_v2.3
  blendshape_id: claire_topo1_v2.1

  face_params:
    eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses:
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses:
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses:
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: james_v2.3
  blendshape_id: james_topo2_v2.2

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses:
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses:
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses:
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: mark_v2.3
  blendshape_id: mark_topo1_v2.1

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.3 # Controls the magnitude of the input audio
    lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.1 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses:
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses:
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses:
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1

Configuration files for Unreal Engine Metahuman#

If you plan to connect A2F-3D with MetaHuman characters then you will need to use configuration files adapted for them. The only changes for these configuration files compared to the default configuration files are the blendshape multipliers and offsets

MetaHuman Stylization Configuration Files
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: claire_v2.3
  blendshape_id: claire_topo1_v2.1

  face_params:
    eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 0.7
      JawLeft: 0.2
      JawRight: 0.2
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.2
      MouthPucker: 1.2
      MouthLeft: 0.2
      MouthRight: 0.2
      MouthSmileLeft: 0.8
      MouthSmileRight: 0.8
      MouthFrownLeft: 0.4
      MouthFrownRight: 0.4
      MouthDimpleLeft: 0.7
      MouthDimpleRight: 0.7
      MouthStretchLeft: 0.1
      MouthStretchRight: 0.1
      MouthRollLower: 0.9
      MouthRollUpper: 0.5
      MouthShrugLower: 0.9
      MouthShrugUpper: 0.4
      MouthPressLeft: 0.8
      MouthPressRight: 0.8
      MouthLowerDownLeft: 0.8
      MouthLowerDownRight: 0.8
      MouthUpperUpLeft: 0.8
      MouthUpperUpRight: 0.8
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 0.2
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 0.8
      NoseSneerRight: 0.8
      TongueOut: 0.0

    weight_offsets:  # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses: # Define which poses are active and which one are not
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses: # Define which poses cancel each other
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses: # Define which poses are symmetric to each other
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: james_v2.3
  blendshape_id: james_topo2_v2.2

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 0.7
      JawLeft: 0.2
      JawRight: 0.2
      JawOpen: 0.8
      MouthClose: 0.3
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 0.2
      MouthRight: 0.2
      MouthSmileLeft: 1.2
      MouthSmileRight: 1.2
      MouthFrownLeft: 0.5
      MouthFrownRight: 0.5
      MouthDimpleLeft: 0.8
      MouthDimpleRight: 0.8
      MouthStretchLeft: 0.05
      MouthStretchRight: 0.05
      MouthRollLower: 0.8
      MouthRollUpper: 0.5
      MouthShrugLower: 1.0
      MouthShrugUpper: 0.4
      MouthPressLeft: 0.8
      MouthPressRight: 0.8
      MouthLowerDownLeft: 0.8
      MouthLowerDownRight: 0.8
      MouthUpperUpLeft: 0.8
      MouthUpperUpRight: 0.8
      BrowDownLeft: 1.2
      BrowDownRight: 1.2
      BrowInnerUp: 1.3
      BrowOuterUpLeft: 0.8
      BrowOuterUpRight: 0.8
      CheekPuff: 0.2
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 0.8
      NoseSneerRight: 0.8
      TongueOut: 0.0

    weight_offsets:  # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses: # Define which poses are active and which one are not
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses: # Define which poses cancel each other
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses: # Define which poses are symmetric to each other
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: mark_v2.3
  blendshape_id: mark_topo1_v2.1

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.3 # Controls the magnitude of the input audio
    lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.1 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 0.7
      JawLeft: 0.2
      JawRight: 0.2
      JawOpen: 1.0
      MouthClose: 0.2
      MouthFunnel: 1.2
      MouthPucker: 1.2
      MouthLeft: 0.2
      MouthRight: 0.2
      MouthSmileLeft: 0.8
      MouthSmileRight: 0.8
      MouthFrownLeft: 0.5
      MouthFrownRight: 0.5
      MouthDimpleLeft: 0.8
      MouthDimpleRight: 0.8
      MouthStretchLeft: 0.05
      MouthStretchRight: 0.05
      MouthRollLower: 0.8
      MouthRollUpper: 0.5
      MouthShrugLower: 0.9
      MouthShrugUpper: 0.4
      MouthPressLeft: 0.8
      MouthPressRight: 0.8
      MouthLowerDownLeft: 0.8
      MouthLowerDownRight: 0.8
      MouthUpperUpLeft: 0.8
      MouthUpperUpRight: 0.8
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 0.2
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 0.8
      NoseSneerRight: 0.8
      TongueOut: 0.0

    weight_offsets:  # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses: # Define which poses are active and which one are not
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses: # Define which poses cancel each other
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses: # Define which poses are symmetric to each other
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1

Parameter Tuning Guide#

Audio2Face-3D imports inference parameters from multiple sources: the inference model SDK, configuration files at deployment-time, and runtime input. Generally, parameters at deployment time override those matching in the model files, while runtime parameters override both deployment-time and model default parameters.

For runtime parameters please see AudioStreamHeader and FaceParameters, BlendShapeParameters, EmotionParameters, EmotionPostProcessingParameters for proto definitions.

FaceParameters

Only a subset of FaceParameters is supported for runtime tuning. See FaceParameters for the list of supported ones.

Emotion Post-processing Parameters

Audio2Emotion SDK automatically parses emotions from the incoming audio and generates emotion vectors to drive the character’s facial animation performance. Use the post processing parameters below to further tailor the performance to your desired specifications. Note that the order of operations listed below is the specific sequence in which the processes are executed in the technology stack.

Emotion Contrast

Emotion contrast is applied to the inference output, controlling the emotion spread using the sigmoid function. This adjustment pushes the higher and lower values, allowing for a wider range in the generated emotional performance.

Max Emotions

Max emotions allows the user to set a hard limit on the number of emotions that Audio2Emotion SDK will engage. Emotions are prioritized by their strength. Once the maximum number of emotions is reached, only vectors for these prioritized emotions will be engaged, and all other emotions will be null. This helps achieve a more accurate read on the correct emotion when the vocal emotional performance is more subtle

For example - if Joy and Amazement are the strongest predicted emotions, and you set the Max Emotions limit to 2, only Joy and Amazement will be applied to the performance.

Emotion index conversion

Emotion index conversion uses emotion correspondence to remap emotions from Audio2Emotion to Audio2Face SDKs.

Smoothing

Uses a live blend coefficient to do an exponential smoothing on the remapped emotions.

Blend Preferred Emotion

The preferred emotion (manual emotion) and the inference emotion output are combined to generate a composite final output of all emotion data.

Transition smoothing

Transition smoothing applies an exponential smoothing to the final emotion values. (the composite of Audio2Emotion + preferred emotion)

Emotion Strength

This controls the overall emotion strength of the final emotion composite from the previous emotion processes. A multiplier to the final emotion result. (Audio2Emotion + preferred)

Preferred Emotion

Use the emotion sliders to create a preferred (manual) emotion pose as the base emotion for the character animation. The preferred emotion is taken from the current settings in the Emotion widget and is blended with the generated emotions throughout the animation.

Blendshape parameters

Currently, the default blendshape parameters included in the model data are tuned for use with Metahuman avatars. For our default avatars (Claire, Mark, Ben), all 52 values of weight_multipliers in the stylization config should be set to 1.0.

Environment variables#

The following table describes the environment variables that can be passed to Audio2Face-3D NIM as a -e argument added to a docker run command:

Variable

Required

Values

Notes

NGC_API_KEY

No

Any string representing a valid NGC API Key

Required only if you want to download TRT engines from NGC. You must set this variable to the value of your personal NGC API key.

NIM_LOGGING_JSONL

No

true / false

Enables (true) or disables (false) JSON Lines format logging to stdout.

NIM_MANIFEST_PROFILE

No

Any valid manifest profile string

Choose the manifest profile id from Supported Models for your GPU.

NIM_DISABLE_MODEL_DOWNLOAD

No

true / false

Disables (true) or enables (false) automatic TRT engine downloads from NGC.

NIM_SKIP_A2F_START

No

true / false

If set to true, the container will not start the A2F-3D service at startup.

Volumes#

The following table describes the paths inside the container into which the local paths can be mounted. For example, you can mount a volume with the following docker flag -v {LOCAL_PATH}:{PATH_IN_CONTAINER}.

Container path

Required

Notes

/tmp/a2x/

Not required, but if this volume is not mounted, the container will have to do a fresh download or generation of the model each time it is brought up

Path for AI models. Must have execute, read and write permissions or 777.

/mnt/configs/

Needed only in the case where you want to override some configuration parameters

Path for files to override configs

Quick Deployment of Audio2Face-3D Microservices#

Instead of deploying the Audio2Face-3D and manually starting the model, you can quickly deploy them together using the docker-compose file following the quick-start instructions provided in the NVIDIA Audio2Face-3D Samples repo.