Migrating from EA3
Audio2Face Controller
Service Interface
The old interface exposed the following proto:
service A2XServiceInterface {
rpc ConvertAudioToAnimData(stream A2XAudioStream) returns (stream A2XAnimDataStream) {}
}
This has been renammed to:
service A2FControllerService {
rpc ProcessAudioStream(stream nvidia_ace.controller.v1.AudioStream)
returns (stream nvidia_ace.controller.v1.AnimationDataStream) {}
}
It is still a bi-directionnal stream so not much changes from this point of view.
Audio data
Old protobuf audio data serialization:
message A2XAudioStream {
bytes audio_chunk = 1;
map<string, float> emotion_map = 2;
string posture_var = 3;
PacketType type = 4;
}
The A2XAudioStream
message has been replaced by the AudioStream
message:
message AudioStream {
message EndOfAudio {}
oneof stream_part {
AudioStreamHeader audio_stream_header = 1;
nvidia_ace.a2f.v1.AudioWithEmotion audio_with_emotion = 2;
EndOfAudio end_of_audio = 3;
}
}
Now instead of a PacketType
you can send either AudioStreamHeader
, AudioWithEmotion
or EndOfAudio
content.
Audio header
The PacketType.BEGIN
has been replaced by a proper AudioHeader
which has the following fields:
message AudioHeader {
enum AudioFormat { AUDIO_FORMAT_PCM = 0; }
AudioFormat audio_format = 1;
// Currently only mono sound must be supported.
uint32 channel_count = 2;
// Defines the sample rate of the provided audio data
uint32 samples_per_second = 3;
// Currently only 16 bits per sample must be supported.
uint32 bits_per_sample = 4;
}
You need to fill the content of the fields according to the specifity of the audio content. Note that only 16 bits mono PCM audio is currently supported.
The AudioStreamHeader
contains more fields to parametrize the output of A2F and has the following fields:
message AudioStreamHeader {
nvidia_ace.audio.v1.AudioHeader audio_header = 1;
// New additional parameters, see documentation.
nvidia_ace.a2f.v1.FaceParameters face_params = 2;
nvidia_ace.a2f.v1.EmotionPostProcessingParameters emotion_post_processing_params = 3;
nvidia_ace.a2f.v1.BlendShapeParameters blendshape_params = 4;
}
Audio content
The audio_chunk
and emotion_map
fields have been moved and renammed to the AudioWithEmotion
message.
message AudioWithEmotion {
bytes audio_buffer = 1;
repeated nvidia_ace.emotion_with_timecode.v1.EmotionWithTimeCode emotions = 2;
}
The new version of A2F allows for change of emotion during the processing, so now instead of a simple emotion_map
you need to send an EmotionWithTimeCode
object which contains the following fields:
message EmotionWithTimeCode {
double time_code = 1;
map<string, float> emotion = 2;
}
End of audio
Now the end of audio file is marked with a empty EndOfStream
packet instead of the PacketType
enum.
Animation data
The old A2XAnimDataStream
prototype contained the following three messages:
message A2XAnimDataStreamHeader {
bool success = 1;
string message = 2;
}
message A2XAnimDataStreamInformation {
int32 code = 1;
string message = 2;
}
message A2XAnimDataStreamContent {
string usda = 1;
map<string, bytes> files = 2;
}
Now the animation data has migrated from USD string format to USD represented by gRPC messages to allow better data compression, the new AnimationDataStream contains the following fields:
message AnimationDataStream {
oneof stream_part {
AnimationDataStreamHeader animation_data_stream_header = 1;
nvidia_ace.animation_data.v1.AnimationData animation_data = 2;
Event event = 3;
nvidia_ace.status.v1.Status status = 4;
}
}
message AnimationData {
optional SkelAnimation skel_animation = 1;
optional AudioWithTimeCode audio = 2;
optional Camera camera = 3;
// Metadata such as emotion aggregates, etc...
map<string, google.protobuf.Any> metadata = 4;
}
Animation data is now contained in SkelAnimation
objects in the AnimationData
message.
See grpc prototypes documentation for further explanations.
Audio2Face without controller
The A2F prototypes are fairly similar to those of A2F Controller except for the AudioStreamHeader
message which
adds extra metadata called animation_ids
for request handling.
message AudioStreamHeader {
// IDs of the current stream
nvidia_ace.animation_id.v1.AnimationIds animation_ids = 1;
nvidia_ace.audio.v1.AudioHeader audio_header = 2;
// Parameters for updating the facial characteristics of an avatar
// See the documentation for more information
FaceParameters face_params = 3;
// Parameters relative to the emotion blending and processing
// before using it to generate blendshapes
// See the documentation for more information
EmotionPostProcessingParameters emotion_post_processing_params = 4;
// Multipliers and offsets to apply to the generated blendshape values
BlendShapeParameters blendshape_params = 5;
}
Following the same pattern, AnimationDataHeader
coming out of A2F also contains metadata.
message AnimationDataStreamHeader {
nvidia_ace.animation_id.v1.AnimationIds animation_ids = 1;
optional string source_service_id = 2;
optional nvidia_ace.audio.v1.AudioHeader audio_header = 3;
optional nvidia_ace.animation_data.v1.SkelAnimationHeader skel_animation_header = 4;
double start_time_code_since_epoch = 5;
}
These IDs allow internal tracking of: * the audio clips being processed with request_id * the current 3D model where to apply multiple audio clips with stream_id