Basic Inference#
Perform a health check on the gRPC endpoint.
Install
grpcurl
from github.com/fullstorydev/grpcurl/releases.Example commands to run on Ubuntu:
wget https://github.com/fullstorydev/grpcurl/releases/download/v1.9.1/grpcurl_1.9.1_linux_amd64.deb sudo dpkg -i grpcurl_1.9.1_linux_amd64.deb
Download the health checking proto:
wget https://raw.githubusercontent.com/grpc/grpc/master/src/proto/grpc/health/v1/health.proto
Run the health check:
grpcurl --plaintext --proto health.proto localhost:8001 grpc.health.v1.Health/Check
If the service is ready, you get a response similar to the following:
{ "status": "SERVING" }
Note
For using grpcurl with an SSL-enabled server, avoid using the
--plaintext
argument, and use--cacert
with a CA certificate,--key
with a private key, or--cert
with a certificate file. For more details, refer togrpcurl --help
.Download the Audio2Face-2D client code by cloning the gRPC Client repository:
git clone https://github.com/NVIDIA-Maxine/nim-clients.git # Go to the 'audio2face-2d' folder cd nim-clients/audio2face-2d/
Install the required dependencies:
For python client
# Install python on Linux sudo apt-get install python3-pip pip install -r python/requirements.txt
For Node.js client
# Add the repo and install the latest stable Node.js on Linux curl -fsSL https://deb.nodesource.com/setup_current.x | sudo -E bash - sudo apt install nodejs # Install all the required packages using package.json file in nodejs directory npm install --prefix nodejs/
Compile the Protos (Optional)#
If you want to use the client code provided in the github Client repository, you can skip this step.
The proto files are available in the audio2face-2d/protos
folder. You can compile them to generate client interfaces in your preferred programming language. For more details, refer to Supported languages in the gRPC documentation.
Here is an example of how to compile the protos for Python and Node.js on Linux and Windows.
Python#
The grpcio
version needed for compilation can be referred from requirements.txt
To compile protos on Linux, run:
// Go to audio2face-2d/protos/linux/python folder
cd audio2face-2d/protos/linux/python
chmod +x compile_protos.sh
./compile_protos.sh
To compile protos on Windows, run:
// Go to audio2face-2d/protos/windows/python folder
cd audio2face-2d/protos/windows/python
./compile_protos.bat
Node.js#
Before running the Node.js client, you can choose to compile the protos.
To compile protos on Linux, run:
// Go to audio2face-2d/protos/linux/nodejs folder
cd audio2face-2d/protos/linux/nodejs
chmod +x compile_protos.sh
./compile_protos.sh
To compile protos on Windows, run:
// Go to audio2face-2d/protos/windows/nodejs folder
cd audio2face-2d/protos/windows/nodejs
./compile_protos.bat
Running Inference via Python Script#
You can use the sample client script in the Audio2Face-2D GitHub repo to send a gRPC request to the hosted NIM server:
Go to the Python scripts directory.
cd python/scripts
Run the command to send a gRPC request (all command-line parameters are optional).
python audio2face-2d.py --target <server_ip:port> --audio-input <input audio file path> --portrait-input <input portrait image file path> --output <output file path and the file name> --head-rotation-animation-filepath <rotation animation filepath> --head-translation-animation-filepath <translation animation filepath> --ssl-mode <ssl mode value> --ssl-key <ssl key file path> --ssl-cert <ssl cert filepath> --ssl-root-cert <ssl root cert filepath>
Note
The first inference is not indicative of the model’s actual performance, as it includes the time taken by the Triton Inference Server to load the models for the first time. The subsequent inference requests will reflect the actual processing performance.
For Blackwell GPUs, the initial inference might time out because of the time needed to load the models. If timeout occurs, send another request; subsequent inferences will reflect the actual processing performance.
To view details of command-line arguments, run this command:
python audio2face-2d.py -h
You will get a response similar to the following.
options:
-h, --help show this help message and exit
--ssl-mode {DISABLED,MTLS,TLS} Flag to set SSL mode, default is None
--ssl-key SSL_KEY The path to ssl private key.
--ssl-cert SSL_CERT The path to ssl certificate chain.
--ssl-root-cert SSL_ROOT_CERT The path to ssl root certificate.
--target TARGET IP:port of gRPC service, when hosted locally.
--audio-input AUDIO_INPUT The path to the input audio file.
--portrait-input PORTRAIT_INPUT The path to the input portrait file.
--output OUTPUT The path for the output video file.
--head-rotation-animation-filepath HEAD_ROTATION_ANIMATION_FILEPATH The path for the head_rotation_animation.csv file. Only required for HEAD_POSE_MODE_USER_DEFINED_ANIMATION
--head-translation-animation-filepath HEAD_TRANSLATION_ANIMATION_FILEPATH The path for the head_translation_animation.csv file. Only required for HEAD_POSE_MODE_USER_DEFINED_ANIMATION
If the command-line arguments are not passed, the script uses the following default values:
Default Command-Line Arguments#
Argument |
Default Value |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Running Inference via Node.js Script#
A Node.js client, similar to a Python client, can also be used to exercise the A2F2D feature by sending gRPC requests to a hosted NIM server.
The audio2face-2d.js
Node.js script takes a portrait image and a .wav
or .pcm
audio file (default is .wav
) and generates the MP4 video output.
Go the Node.js script folder
cd nodejs/scripts
Run the following command to send a gRPC request (all command-line parameters are optional):
node audio2face-2d.js --target <server_ip:port> --audio-input <input audio file path> --portrait-input <input portrait image file path> --output <output file path and the file name> --format <wav/pcm> --head-rotation-animation-filepath <rotation animation file path> --head-translation-animation-filepath <translation animation filepath> --ssl-mode <ssl mode value> --ssl-key <ssl key file path> --ssl-cert <ssl cert filepath> --ssl-root-cert <ssl root cert filepath>
The default configuration expected for PCM audio format in the Node.js client is as follows:
Sample rate: 48kHz
Channels: Mono-channel
Bit Depth: 16
If any other configuration is needed, change it in the Node.js client audio2face-2d/nodejs/scripts/audio2face-2d.js
in the function sendInputAudioChunks()
.
To view details of the command-line arguments, run this command:
node audio2face-2d.js -h
You will get a response similar to the following.
Options:
--ssl-mode <type> Flag to set SSL mode (default: "DISABLED").
--ssl-key <type> Path to SSL private key. (default: "../ssl_key/ssl_key_client.pem").
--ssl-cert <type> Path to SSL certificate chain. (default: "../ssl_key/ssl_cert_client.pem").
--ssl-root-cert <type> Path to SSL root certificate. (default: "../ssl_key/ssl_ca_cert.pem").
--target <type> IP:port of gRPC service, when hosted locally. (default: "127.0.0.1:8001").
--audio-input <type> Path to the input audio file. (default: "../../assets/sample_audio.wav").
--head-rotation-animation-filepath <type> Path to head rotation animation csv file. (default: "../../assets/head_rotation_animation.csv").
--head-translation-animation-filepath <type> Path to head translation animation csv file. (default: "../../assets/head_translation_animation.csv").
--portrait-input <type> Path to the input portrait file. (default: "../../assets/sample_portrait_image.png").
--format <type> Audio format - wav or pcm (default: "wav").
--output <type> Path for the output video file. (default: "output.mp4").
-h, --help Display help for command.
Performance data
The following table shows the performance data for the Audio2Face-2D model on various GPUs and modes with the provided sample input files and the Python client.
GPU |
Model Selection |
Mode |
Elapsed Time (s) |
---|---|---|---|
L40 |
Performance |
ANIMATION_CROPPING_MODE_FACEBOX |
20.48 |
Performance |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
60.07 |
|
Quality |
ANIMATION_CROPPING_MODE_FACEBOX |
23.73 |
|
Quality |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
66.56 |
|
A10 |
Performance |
ANIMATION_CROPPING_MODE_FACEBOX |
33.83 |
Performance |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
74.40 |
|
Quality |
ANIMATION_CROPPING_MODE_FACEBOX |
39.04 |
|
Quality |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
83.14 |
|
T4 |
Performance |
ANIMATION_CROPPING_MODE_FACEBOX |
50.12 |
Performance |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
88.76 |
|
Quality |
ANIMATION_CROPPING_MODE_FACEBOX |
78.42 |
|
Quality |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
115.33 |
|
V100 |
Performance |
ANIMATION_CROPPING_MODE_FACEBOX |
31.08 |
Performance |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
71.17 |
|
Quality |
ANIMATION_CROPPING_MODE_FACEBOX |
39.85 |
|
Quality |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
78.19 |
|
RTX 5090 |
Performance |
ANIMATION_CROPPING_MODE_FACEBOX |
25.36 |
Performance |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
51.45 |
|
Quality |
ANIMATION_CROPPING_MODE_FACEBOX |
27.83 |
|
Quality |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
60.27 |
|
RTX 4090 |
Performance |
ANIMATION_CROPPING_MODE_FACEBOX |
22.80 |
Performance |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
51.09 |
|
Quality |
ANIMATION_CROPPING_MODE_FACEBOX |
24.79 |
|
Quality |
ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING |
50.61 |