Migration Guide#

Migrating from 1.3 to 2.0#

Audio2Face-3D NIM v2.0 introduces significant changes to the deployment-time configuration file schemas and adds support for diffusion-based animation models. Configuration files from v1.3 are not directly compatible with v2.0 and require manual migration.

For the updated default configuration files and examples, refer to Audio2Face-3D NIM Container Deployment and Configuration Guide.

Stylization Configuration Changes#

The a2f section structure has been modernized:

Old format (v1.3):

a2f:
  inference_model_id: claire_v2.3
  blendshape_id: claire_topo1_v2.1
  tongue_blendshape_id: claire_tongue_v1.0
  enable_tongue_blendshapes: true

New format (v2.0):

a2f:
  # regression / diffusion
  inference_type: regression

  regression_model:
    inference_model_id: claire_v2.3.1

  diffusion_model:
    inference_model_id: multi_v3.2
    identity: claire
    constant_noise: true

  enable_tongue_blendshapes: true

Key Changes#

New inference type selector: The inference_type field selects between regression (fast, deterministic) and diffusion (higher quality, more expressive) animation modes.
Nested model configuration: Model IDs are now specified under regression_model and diffusion_model sections instead of at the top level of a2f.
Removed fields: blendshape_id and tongue_blendshape_id are no longer used.
Updated model versions: Regression models updated to claire_v2.3.1, james_v2.3.1, mark_v2.3. The new diffusion model is multi_v3.2.

New tongue parameters in face_params: Added tongue_strength, tongue_height_offset, and tongue_depth_offset to control tongue animation.

face_params:
  # ... existing params ...
  tongue_strength: 1.3
  tongue_height_offset: 0.0
  tongue_depth_offset: 0.0

Extended tongue blendshapes: Added 16 new tongue blendshapes to blendshape_params sections (weight_multipliers, weight_offsets, active_poses, cancel_poses, symmetry_poses):
- TongueTipUp, TongueTipDown, TongueTipLeft, TongueTipRight
- TongueRollUp, TongueRollDown, TongueRollLeft, TongueRollRight
- TongueUp, TongueDown, TongueLeft, TongueRight
- TongueIn, TongueStretch, TongueWide, TongueNarrow

New blendshape streaming controls in advanced_config.yaml: Added pipeline_parameters.burst_mode and pipeline_parameters.blendshape_streaming_fps to control how blendshapes are sent to the client.

pipeline_parameters:
  # Burst Mode: Send all frames as fast as possible (~20-30ms total)
  # WARNING: May cause AnimGraph buffer overflow and lip sync issues in Tokkio
  # Values: false = Rate-limited streaming (RECOMMENDED), true = Burst mode
  burst_mode: false

  # Streaming Frame Rate (only used when burst_mode = false)
  # Delay per frame = 1000 / blendshape_streaming_fps milliseconds
  # Recommended: 90 (Tokkio/Production), 120-240 (Low latency), 30-60 (Bandwidth-constrained)
  blendshape_streaming_fps: 90

GPU blendshape solver in advanced_config.yaml: Added a2f.use_gpu_solver option (default: true). When enabled, blendshape solving runs entirely on GPU, improving performance by avoiding CPU-GPU data transfers.

Migration Steps#

Replace the top-level inference_model_id, blendshape_id, and tongue_blendshape_id fields with the new nested regression_model and diffusion_model sections.
Add the inference_type field set to regression (or diffusion if using the new diffusion model).
Update the model version in regression_model.inference_model_id (e.g., claire_v2.3 → claire_v2.3.1).
Add the diffusion_model section with inference_model_id: multi_v3.2 and the appropriate identity.
Add tongue parameters to face_params if tongue animation control is desired.
Add the 16 new tongue blendshapes to all blendshape_params subsections if custom blendshape tuning is used.

Migrating from 1.2 to 1.3#

No action is needed. The Audio2Face-3D NIM configuration files are backwards compatible between version 1.2 and 1.3.

Migrating from 1.0 to 1.2#

The Audio2Face-3D NIM (version 1.2) was previously a suite of 2 microservices (in version 1.0): Audio2Face Microservice and Audio2Face Controller. This page will guide you through what has changed between the two versions.

Audio2Face Controller#

The Audio2Face Controller’s functionality has been integrated into Audio2Face-3D Microservice. The gRPC proto service remains the same.

Service Interface#

service A2FControllerService {
  rpc ProcessAudioStream(stream nvidia_ace.controller.v1.AudioStream)
      returns (stream nvidia_ace.controller.v1.AnimationDataStream) {}
}

Audio2Face-3D Microservice#

Audio stream header#

The new field emotion_params in AudioStreamHeader message controls temporal smoothing in the Audio2Emotion SDK.

message AudioStreamHeader {
  nvidia_ace.audio.v1.AudioHeader audio_header = 1;

  nvidia_ace.a2f.v1.FaceParameters face_params = 2;

  nvidia_ace.a2f.v1.EmotionPostProcessingParameters emotion_post_processing_params = 3;

  nvidia_ace.a2f.v1.BlendShapeParameters blendshape_params = 4;

  nvidia_ace.a2f.v1.EmotionParameters emotion_params = 5;
}

The new EmotionParameters message introduces control for emotion smoothing over time. The live_transition_time field defines the duration over which the emotion should be smoothed. The beginning_emotions field provides the initial set of emotions to the smoothing algorithm.

message EmotionParameters {
  optional float live_transition_time = 1;

   map<string, float> beginning_emotion = 2;
}

Blendshape parameters#

The new field enable_clamping_bs_weight in BlendShapeParameters message controls whether or not to clamp the values of the returned blendshapes between 0 and 1. The clamping is applied after multipliers and offsets are applied.

Blendshape clamping is a post-processing step that ensures blendshape weights stay within the standard [0.0, 1.0] range expected by most animation systems. The A2F neural network can produce values outside this range, so clamping normalizes them for compatibility with downstream renderers.

Clamping ON (true): Values guaranteed 0.0-1.0, safe for renderers expecting normalized weights. Recommended for production.
Clamping OFF (false): Values can exceed range (e.g., 1.2, -0.1), preserves full model output fidelity. Useful for debugging/analysis.

message BlendShapeParameters {

  map<string, float> bs_weight_multipliers = 1;

  map<string, float> bs_weight_offsets = 2;

  optional bool enable_clamping_bs_weight = 3;
}

Migrating Configuration files#

In order to make the configuration file migration easier we provide here a tool to do it.

Clone the repository: NVIDIA/Audio2Face-3D-Samples.git

Checkout the v1.2 tag: git checkout tags/v1.2

Go to migration/deployment_configuration_files_from_v1.0_to_v1.2/ subfolder.

And follow the setup instructions below:

Migrating Kubernetes deployment#

The quick deployment resource for Audio2Face-3D via NGC is no longer available. For a straightforward Kubernetes deployment, refer to the detailed steps in this guide: Kubernetes Deployment.