Migration Guide#
Migrating from 1.3 to 2.0#
Audio2Face-3D NIM v2.0 introduces significant changes to the deployment-time configuration file schemas and adds support for diffusion-based animation models. Configuration files from v1.3 are not directly compatible with v2.0 and require manual migration.
For the updated default configuration files and examples, refer to Audio2Face-3D NIM Container Deployment and Configuration Guide.
Stylization Configuration Changes#
The a2f section structure has been modernized:
Old format (v1.3):
a2f:
inference_model_id: claire_v2.3
blendshape_id: claire_topo1_v2.1
tongue_blendshape_id: claire_tongue_v1.0
enable_tongue_blendshapes: true
New format (v2.0):
a2f:
# regression / diffusion
inference_type: regression
regression_model:
inference_model_id: claire_v2.3.1
diffusion_model:
inference_model_id: multi_v3.2
identity: claire
constant_noise: true
enable_tongue_blendshapes: true
Key Changes#
New inference type selector: The
inference_typefield selects betweenregression(fast, deterministic) anddiffusion(higher quality, more expressive) animation modes.Nested model configuration: Model IDs are now specified under
regression_modelanddiffusion_modelsections instead of at the top level ofa2f.Removed fields:
blendshape_idandtongue_blendshape_idare no longer used.Updated model versions: Regression models updated to
claire_v2.3.1,james_v2.3.1,mark_v2.3. The new diffusion model ismulti_v3.2.New tongue parameters in face_params: Added
tongue_strength,tongue_height_offset, andtongue_depth_offsetto control tongue animation.face_params: # ... existing params ... tongue_strength: 1.3 tongue_height_offset: 0.0 tongue_depth_offset: 0.0
Extended tongue blendshapes: Added 16 new tongue blendshapes to
blendshape_paramssections (weight_multipliers,weight_offsets,active_poses,cancel_poses,symmetry_poses):TongueTipUp, TongueTipDown, TongueTipLeft, TongueTipRight
TongueRollUp, TongueRollDown, TongueRollLeft, TongueRollRight
TongueUp, TongueDown, TongueLeft, TongueRight
TongueIn, TongueStretch, TongueWide, TongueNarrow
New blendshape streaming controls in
advanced_config.yaml: Addedpipeline_parameters.burst_modeandpipeline_parameters.blendshape_streaming_fpsto control how blendshapes are sent to the client.pipeline_parameters: # Burst Mode: Send all frames as fast as possible (~20-30ms total) # WARNING: May cause AnimGraph buffer overflow and lip sync issues in Tokkio # Values: false = Rate-limited streaming (RECOMMENDED), true = Burst mode burst_mode: false # Streaming Frame Rate (only used when burst_mode = false) # Delay per frame = 1000 / blendshape_streaming_fps milliseconds # Recommended: 90 (Tokkio/Production), 120-240 (Low latency), 30-60 (Bandwidth-constrained) blendshape_streaming_fps: 90
GPU blendshape solver in
advanced_config.yaml: Addeda2f.use_gpu_solveroption (default:true). When enabled, blendshape solving runs entirely on GPU, improving performance by avoiding CPU-GPU data transfers.
Migration Steps#
Replace the top-level
inference_model_id,blendshape_id, andtongue_blendshape_idfields with the new nestedregression_modelanddiffusion_modelsections.Add the
inference_typefield set toregression(ordiffusionif using the new diffusion model).Update the model version in
regression_model.inference_model_id(e.g.,claire_v2.3→claire_v2.3.1).Add the
diffusion_modelsection withinference_model_id: multi_v3.2and the appropriateidentity.Add tongue parameters to
face_paramsif tongue animation control is desired.Add the 16 new tongue blendshapes to all
blendshape_paramssubsections if custom blendshape tuning is used.
Migrating from 1.2 to 1.3#
No action is needed. The Audio2Face-3D NIM configuration files are backwards compatible between version 1.2 and 1.3.
Migrating from 1.0 to 1.2#
The Audio2Face-3D NIM (version 1.2) was previously a suite of 2 microservices (in version 1.0): Audio2Face Microservice and Audio2Face Controller. This page will guide you through what has changed between the two versions.
Audio2Face Controller#
The Audio2Face Controller’s functionality has been integrated into Audio2Face-3D Microservice. The gRPC proto service remains the same.
Service Interface#
service A2FControllerService {
rpc ProcessAudioStream(stream nvidia_ace.controller.v1.AudioStream)
returns (stream nvidia_ace.controller.v1.AnimationDataStream) {}
}
Audio2Face-3D Microservice#
Audio stream header#
The new field emotion_params in AudioStreamHeader message controls temporal smoothing in the Audio2Emotion SDK.
message AudioStreamHeader {
nvidia_ace.audio.v1.AudioHeader audio_header = 1;
nvidia_ace.a2f.v1.FaceParameters face_params = 2;
nvidia_ace.a2f.v1.EmotionPostProcessingParameters emotion_post_processing_params = 3;
nvidia_ace.a2f.v1.BlendShapeParameters blendshape_params = 4;
nvidia_ace.a2f.v1.EmotionParameters emotion_params = 5;
}
The new EmotionParameters message introduces control for emotion smoothing over time. The live_transition_time
field defines the duration over which the emotion should be smoothed. The beginning_emotions field provides the
initial set of emotions to the smoothing algorithm.
message EmotionParameters {
optional float live_transition_time = 1;
map<string, float> beginning_emotion = 2;
}
Blendshape parameters#
The new field enable_clamping_bs_weight in BlendShapeParameters message controls whether or not to clamp the
values of the returned blendshapes between 0 and 1. The clamping is applied after multipliers and offsets are applied.
Blendshape clamping is a post-processing step that ensures blendshape weights stay within the standard [0.0, 1.0] range expected by most animation systems. The A2F neural network can produce values outside this range, so clamping normalizes them for compatibility with downstream renderers.
Clamping ON (true): Values guaranteed 0.0-1.0, safe for renderers expecting normalized weights. Recommended for production.
Clamping OFF (false): Values can exceed range (e.g., 1.2, -0.1), preserves full model output fidelity. Useful for debugging/analysis.
message BlendShapeParameters {
map<string, float> bs_weight_multipliers = 1;
map<string, float> bs_weight_offsets = 2;
optional bool enable_clamping_bs_weight = 3;
}
Migrating Configuration files#
In order to make the configuration file migration easier we provide here a tool to do it.
Clone the repository: NVIDIA/Audio2Face-3D-Samples.git
Checkout the v1.2 tag: git checkout tags/v1.2
Go to migration/deployment_configuration_files_from_v1.0_to_v1.2/ subfolder.
And follow the setup instructions below:
Configuration file migration guide
This sample python app allows you to migrate your A2F-3D config files from v1.0 to v1.2.
Prerequisite
Install:
python3
python3-venv
Setup a virtual environment and install the needed packages:
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r requirements.txt
Steps
To do so there are 2 possibilities. You want to migrate the A2F-3D config files used for:
running the docker container
deploying the UCS app
Updating docker container configs
Update:
docker_container_configs/a2f_config.yaml
docker_container_configs/ac_a2f_config.yaml
with your own config files.
Then run:
$ python3 convert_configuration_files.py docker_config
This will generate new config files compatible with A2F-3D v1.2 and print the folder name.
Updating the UCS app configs
Update:
ucs_app_configs/a2f_config.yaml
with your own config file.
Then run:
$ python3 convert_configuration_files.py ucs
This will generate new config files compatible with A2F-3D v1.2 and print the folder name.
Migrating Kubernetes deployment#
The quick deployment resource for Audio2Face-3D via NGC is no longer available. For a straightforward Kubernetes deployment, refer to the detailed steps in this guide: Kubernetes Deployment.