Basic Inference#

Perform a health check on the gRPC endpoint.
- Install grpcurl from github.com/fullstorydev/grpcurl/releases.
  
  Example commands to run on Ubuntu:
```
wget https://github.com/fullstorydev/grpcurl/releases/download/v1.9.1/grpcurl_1.9.1_linux_amd64.deb
sudo dpkg -i grpcurl_1.9.1_linux_amd64.deb
```
- Download the health checking proto:
```
wget https://raw.githubusercontent.com/grpc/grpc/master/src/proto/grpc/health/v1/health.proto
```
- Run the health check:
```
grpcurl --plaintext --proto health.proto localhost:8001 grpc.health.v1.Health/Check
```
- If the service is ready, you get a response similar to the following:
```
{ "status": "SERVING" }
```
Note

For using grpcurl with an SSL-enabled server, avoid using the --plaintext argument, and use --cacert with a CA certificate, --key with a private key, or --cert with a certificate file. For more details, refer to grpcurl --help.

Download the Audio2Face-2D client code by cloning the gRPC Client repository:

git clone https://github.com/NVIDIA-Maxine/nim-clients.git

# Go to the 'audio2face-2d' folder
cd nim-clients/audio2face-2d/

Install the required dependencies:

For python client

# Install python on Linux
sudo apt-get install python3-pip
pip install -r python/requirements.txt

For Node.js client

# Add the repo and install the latest stable Node.js on Linux
curl -fsSL https://deb.nodesource.com/setup_current.x | sudo -E bash -
sudo apt install nodejs

# Install all the required packages using package.json file in nodejs directory  
npm install --prefix nodejs/ 

Compile the Protos (Optional)#

If you want to use the client code provided in the github Client repository, you can skip this step.

The proto files are available in the audio2face-2d/protos folder. You can compile them to generate client interfaces in your preferred programming language. For more details, refer to Supported languages in the gRPC documentation.

Here is an example of how to compile the protos for Python and Node.js on Linux and Windows.

Python#

The grpcio version needed for compilation can be referred from requirements.txt

To compile protos on Linux, run:

// Go to audio2face-2d/protos/linux/python folder
cd audio2face-2d/protos/linux/python

chmod +x compile_protos.sh
./compile_protos.sh

To compile protos on Windows, run:

// Go to audio2face-2d/protos/windows/python folder
cd audio2face-2d/protos/windows/python

./compile_protos.bat

Node.js#

Before running the Node.js client, you can choose to compile the protos.

To compile protos on Linux, run:

// Go to audio2face-2d/protos/linux/nodejs folder
cd audio2face-2d/protos/linux/nodejs

chmod +x compile_protos.sh
./compile_protos.sh

To compile protos on Windows, run:

// Go to audio2face-2d/protos/windows/nodejs folder
cd audio2face-2d/protos/windows/nodejs

./compile_protos.bat

Running Inference via Python Script#

You can use the sample client script in the Audio2Face-2D GitHub repo to send a gRPC request to the hosted NIM server:

Go to the Python scripts directory.
```
cd python/scripts
```

Run the command to send a gRPC request (all command-line parameters are optional).

python audio2face-2d.py --target <server_ip:port> --audio-input <input audio file path> --portrait-input <input portrait image file path> --output <output file path and the file name> --head-rotation-animation-filepath <rotation animation filepath> --head-translation-animation-filepath <translation animation filepath> --ssl-mode <ssl mode value> --ssl-key <ssl key file path> --ssl-cert <ssl cert filepath> --ssl-root-cert <ssl root cert filepath>

Note

The first inference is not indicative of the model’s actual performance, as it includes the time taken by the Triton Inference Server to load the models for the first time. The subsequent inference requests will reflect the actual processing performance.

For Blackwell GPUs, the initial inference might time out because of the time needed to load the models. If timeout occurs, send another request; subsequent inferences will reflect the actual processing performance.

To view details of command-line arguments, run this command:

python audio2face-2d.py -h

You will get a response similar to the following.

options:
   -h, --help                                                                 show this help message and exit
   --ssl-mode {DISABLED,MTLS,TLS}                                             Flag to set SSL mode, default is None
   --ssl-key SSL_KEY                                                          The path to ssl private key.
   --ssl-cert SSL_CERT                                                        The path to ssl certificate chain.
   --ssl-root-cert SSL_ROOT_CERT                                              The path to ssl root certificate.
   --target TARGET                                                            IP:port of gRPC service, when hosted locally.
   --audio-input AUDIO_INPUT                                                  The path to the input audio file.
   --portrait-input PORTRAIT_INPUT                                            The path to the input portrait file.
   --output OUTPUT                                                            The path for the output video file.
   --head-rotation-animation-filepath HEAD_ROTATION_ANIMATION_FILEPATH        The path for the head_rotation_animation.csv file. Only required for HEAD_POSE_MODE_USER_DEFINED_ANIMATION
   --head-translation-animation-filepath HEAD_TRANSLATION_ANIMATION_FILEPATH  The path for the head_translation_animation.csv file. Only required for HEAD_POSE_MODE_USER_DEFINED_ANIMATION

If the command-line arguments are not passed, the script uses the following default values:

Default Command-Line Arguments#

Argument	Default Value
`target`	`127.0.0.1:8001`
`portrait-input`	`../../assets/sample_portrait_image.png`
`audio-input`	`../../assets/sample_audio.wav`
`output`	`output.mp4` in the current directory
`head-rotation-animation-filepath`	`../../assets/head_rotation_animation.csv`
`head-translation-animation-filepath`	`../../assets/head_translation_animation.csv`
`ssl-mode`	`DISABLED`
`ssl-key`	`../ssl_key/ssl_key_client.pem`
`ssl-cert`	`../ssl_key/ssl_cert_client.pem`
`ssl-root-cert`	`../ssl_key/ssl_ca_cert.pem`

Running Inference via Node.js Script#

A Node.js client, similar to a Python client, can also be used to exercise the A2F2D feature by sending gRPC requests to a hosted NIM server. The audio2face-2d.js Node.js script takes a portrait image and a .wav or .pcm audio file (default is .wav) and generates the MP4 video output.

Go the Node.js script folder

cd nodejs/scripts

Run the following command to send a gRPC request (all command-line parameters are optional):

node audio2face-2d.js --target <server_ip:port> --audio-input <input audio file path> --portrait-input <input portrait image file path> --output <output file path and the file name> --format <wav/pcm> --head-rotation-animation-filepath <rotation animation file path> --head-translation-animation-filepath <translation animation filepath> --ssl-mode <ssl mode value> --ssl-key <ssl key file path> --ssl-cert <ssl cert filepath> --ssl-root-cert <ssl root cert filepath>

The default configuration expected for PCM audio format in the Node.js client is as follows:

Sample rate: 48kHz
Channels: Mono-channel
Bit Depth: 16

If any other configuration is needed, change it in the Node.js client audio2face-2d/nodejs/scripts/audio2face-2d.js in the function sendInputAudioChunks().

To view details of the command-line arguments, run this command:

node audio2face-2d.js -h

You will get a response similar to the following.

Options:
  --ssl-mode <type>                             Flag to set SSL mode (default: "DISABLED").
  --ssl-key <type>                              Path to SSL private key. (default: "../ssl_key/ssl_key_client.pem").
  --ssl-cert <type>                             Path to SSL certificate chain. (default: "../ssl_key/ssl_cert_client.pem").
  --ssl-root-cert <type>                        Path to SSL root certificate. (default: "../ssl_key/ssl_ca_cert.pem").
  --target <type>                               IP:port of gRPC service, when hosted locally. (default: "127.0.0.1:8001").
  --audio-input <type>                          Path to the input audio file. (default: "../../assets/sample_audio.wav").
  --head-rotation-animation-filepath <type>     Path to head rotation animation csv file. (default: "../../assets/head_rotation_animation.csv").
  --head-translation-animation-filepath <type>  Path to head translation animation csv file. (default: "../../assets/head_translation_animation.csv").
  --portrait-input <type>                       Path to the input portrait file. (default: "../../assets/sample_portrait_image.png").
  --format <type>                               Audio format - wav or pcm (default: "wav").
  --output <type>                               Path for the output video file. (default: "output.mp4").
  -h, --help                                    Display help for command.

Performance data

The following table shows the performance data for the Audio2Face-2D model on various GPUs and modes with the provided sample input files and the Python client.

GPU	Model Selection	Mode	Elapsed Time (s)
L40	Performance	ANIMATION_CROPPING_MODE_FACEBOX	20.48
	Performance	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	60.07
	Quality	ANIMATION_CROPPING_MODE_FACEBOX	23.73
	Quality	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	66.56
A10	Performance	ANIMATION_CROPPING_MODE_FACEBOX	33.83
	Performance	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	74.40
	Quality	ANIMATION_CROPPING_MODE_FACEBOX	39.04
	Quality	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	83.14
T4	Performance	ANIMATION_CROPPING_MODE_FACEBOX	50.12
	Performance	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	88.76
	Quality	ANIMATION_CROPPING_MODE_FACEBOX	78.42
	Quality	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	115.33
V100	Performance	ANIMATION_CROPPING_MODE_FACEBOX	31.08
	Performance	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	71.17
	Quality	ANIMATION_CROPPING_MODE_FACEBOX	39.85
	Quality	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	78.19
RTX 5090	Performance	ANIMATION_CROPPING_MODE_FACEBOX	25.36
	Performance	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	51.45
	Quality	ANIMATION_CROPPING_MODE_FACEBOX	27.83
	Quality	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	60.27
RTX 4090	Performance	ANIMATION_CROPPING_MODE_FACEBOX	22.80
	Performance	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	51.09
	Quality	ANIMATION_CROPPING_MODE_FACEBOX	24.79
	Quality	ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING	50.61