Basic Inference#

  1. Perform a health check on the gRPC endpoint.

    • Install grpcurl from github.com/fullstorydev/grpcurl/releases.

      Example commands to run on Ubuntu:

      wget https://github.com/fullstorydev/grpcurl/releases/download/v1.9.1/grpcurl_1.9.1_linux_amd64.deb
      sudo dpkg -i grpcurl_1.9.1_linux_amd64.deb
      
    • Download the health checking proto:

      wget https://raw.githubusercontent.com/grpc/grpc/master/src/proto/grpc/health/v1/health.proto
      
    • Run the health check:

      grpcurl --plaintext --proto health.proto localhost:8001 grpc.health.v1.Health/Check
      
    • If the service is ready, you get a response similar to the following:

      { "status": "SERVING" }
      

    Note

    For using grpcurl with an SSL-enabled server, avoid using the --plaintext argument, and use --cacert with a CA certificate, --key with a private key, or --cert with a certificate file. For more details, refer to grpcurl --help.

  2. Download the Audio2Face-2D client code by cloning the gRPC Client repository:

    git clone https://github.com/NVIDIA-Maxine/nim-clients.git
    
    # Go to the 'audio2face-2d' folder
    cd nim-clients/audio2face-2d/
    
  3. Install the required dependencies:

    • For python client

    # Install python on Linux
    sudo apt-get install python3-pip
    pip install -r python/requirements.txt
    
    • For Node.js client

    # Add the repo and install the latest stable Node.js on Linux
    curl -fsSL https://deb.nodesource.com/setup_current.x | sudo -E bash -
    sudo apt install nodejs
    
    # Install all the required packages using package.json file in nodejs directory  
    npm install --prefix nodejs/ 
    

Compile the Protos (Optional)#

If you want to use the client code provided in the github Client repository, you can skip this step.

The proto files are available in the audio2face-2d/protos folder. You can compile them to generate client interfaces in your preferred programming language. For more details, refer to Supported languages in the gRPC documentation.

Here is an example of how to compile the protos for Python and Node.js on Linux and Windows.

Python#

The grpcio version needed for compilation can be referred from requirements.txt

To compile protos on Linux, run:

// Go to audio2face-2d/protos/linux/python folder
cd audio2face-2d/protos/linux/python

chmod +x compile_protos.sh
./compile_protos.sh

To compile protos on Windows, run:

// Go to audio2face-2d/protos/windows/python folder
cd audio2face-2d/protos/windows/python

./compile_protos.bat

Node.js#

Before running the Node.js client, you can choose to compile the protos.

To compile protos on Linux, run:

// Go to audio2face-2d/protos/linux/nodejs folder
cd audio2face-2d/protos/linux/nodejs

chmod +x compile_protos.sh
./compile_protos.sh

To compile protos on Windows, run:

// Go to audio2face-2d/protos/windows/nodejs folder
cd audio2face-2d/protos/windows/nodejs

./compile_protos.bat

Running Inference via Python Script#

You can use the sample client script in the Audio2Face-2D GitHub repo to send a gRPC request to the hosted NIM server:

  1. Go to the Python scripts directory.

    cd python/scripts
    
  2. Run the command to send a gRPC request (all command-line parameters are optional).

    python audio2face-2d.py --target <server_ip:port> --audio-input <input audio file path> --portrait-input <input portrait image file path> --output <output file path and the file name> --head-rotation-animation-filepath <rotation animation filepath> --head-translation-animation-filepath <translation animation filepath> --ssl-mode <ssl mode value> --ssl-key <ssl key file path> --ssl-cert <ssl cert filepath> --ssl-root-cert <ssl root cert filepath>
    

Note

The first inference is not indicative of the model’s actual performance, as it includes the time taken by the Triton Inference Server to load the models for the first time. The subsequent inference requests will reflect the actual processing performance.

For Blackwell GPUs, the initial inference might time out because of the time needed to load the models. If timeout occurs, send another request; subsequent inferences will reflect the actual processing performance.

To view details of command-line arguments, run this command:

python audio2face-2d.py -h

You will get a response similar to the following.

options:
   -h, --help                                                                 show this help message and exit
   --ssl-mode {DISABLED,MTLS,TLS}                                             Flag to set SSL mode, default is None
   --ssl-key SSL_KEY                                                          The path to ssl private key.
   --ssl-cert SSL_CERT                                                        The path to ssl certificate chain.
   --ssl-root-cert SSL_ROOT_CERT                                              The path to ssl root certificate.
   --target TARGET                                                            IP:port of gRPC service, when hosted locally.
   --audio-input AUDIO_INPUT                                                  The path to the input audio file.
   --portrait-input PORTRAIT_INPUT                                            The path to the input portrait file.
   --output OUTPUT                                                            The path for the output video file.
   --head-rotation-animation-filepath HEAD_ROTATION_ANIMATION_FILEPATH        The path for the head_rotation_animation.csv file. Only required for HEAD_POSE_MODE_USER_DEFINED_ANIMATION
   --head-translation-animation-filepath HEAD_TRANSLATION_ANIMATION_FILEPATH  The path for the head_translation_animation.csv file. Only required for HEAD_POSE_MODE_USER_DEFINED_ANIMATION

If the command-line arguments are not passed, the script uses the following default values:

Default Command-Line Arguments#

Argument

Default Value

target

127.0.0.1:8001

portrait-input

../../assets/sample_portrait_image.png

audio-input

../../assets/sample_audio.wav

output

output.mp4 in the current directory

head-rotation-animation-filepath

../../assets/head_rotation_animation.csv

head-translation-animation-filepath

../../assets/head_translation_animation.csv

ssl-mode

DISABLED

ssl-key

../ssl_key/ssl_key_client.pem

ssl-cert

../ssl_key/ssl_cert_client.pem

ssl-root-cert

../ssl_key/ssl_ca_cert.pem

Running Inference via Node.js Script#

A Node.js client, similar to a Python client, can also be used to exercise the A2F2D feature by sending gRPC requests to a hosted NIM server. The audio2face-2d.js Node.js script takes a portrait image and a .wav or .pcm audio file (default is .wav) and generates the MP4 video output.

Go the Node.js script folder

cd nodejs/scripts

Run the following command to send a gRPC request (all command-line parameters are optional):

node audio2face-2d.js --target <server_ip:port> --audio-input <input audio file path> --portrait-input <input portrait image file path> --output <output file path and the file name> --format <wav/pcm> --head-rotation-animation-filepath <rotation animation file path> --head-translation-animation-filepath <translation animation filepath> --ssl-mode <ssl mode value> --ssl-key <ssl key file path> --ssl-cert <ssl cert filepath> --ssl-root-cert <ssl root cert filepath>

The default configuration expected for PCM audio format in the Node.js client is as follows:

  • Sample rate: 48kHz

  • Channels: Mono-channel

  • Bit Depth: 16

If any other configuration is needed, change it in the Node.js client audio2face-2d/nodejs/scripts/audio2face-2d.js in the function sendInputAudioChunks().

To view details of the command-line arguments, run this command:

node audio2face-2d.js -h

You will get a response similar to the following.

Options:
  --ssl-mode <type>                             Flag to set SSL mode (default: "DISABLED").
  --ssl-key <type>                              Path to SSL private key. (default: "../ssl_key/ssl_key_client.pem").
  --ssl-cert <type>                             Path to SSL certificate chain. (default: "../ssl_key/ssl_cert_client.pem").
  --ssl-root-cert <type>                        Path to SSL root certificate. (default: "../ssl_key/ssl_ca_cert.pem").
  --target <type>                               IP:port of gRPC service, when hosted locally. (default: "127.0.0.1:8001").
  --audio-input <type>                          Path to the input audio file. (default: "../../assets/sample_audio.wav").
  --head-rotation-animation-filepath <type>     Path to head rotation animation csv file. (default: "../../assets/head_rotation_animation.csv").
  --head-translation-animation-filepath <type>  Path to head translation animation csv file. (default: "../../assets/head_translation_animation.csv").
  --portrait-input <type>                       Path to the input portrait file. (default: "../../assets/sample_portrait_image.png").
  --format <type>                               Audio format - wav or pcm (default: "wav").
  --output <type>                               Path for the output video file. (default: "output.mp4").
  -h, --help                                    Display help for command.

Performance data

The following table shows the performance data for the Audio2Face-2D model on various GPUs and modes with the provided sample input files and the Python client.

GPU

Model Selection

Mode

Elapsed Time (s)

L40

Performance

ANIMATION_CROPPING_MODE_FACEBOX

20.48

Performance

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

60.07

Quality

ANIMATION_CROPPING_MODE_FACEBOX

23.73

Quality

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

66.56

A10

Performance

ANIMATION_CROPPING_MODE_FACEBOX

33.83

Performance

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

74.40

Quality

ANIMATION_CROPPING_MODE_FACEBOX

39.04

Quality

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

83.14

T4

Performance

ANIMATION_CROPPING_MODE_FACEBOX

50.12

Performance

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

88.76

Quality

ANIMATION_CROPPING_MODE_FACEBOX

78.42

Quality

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

115.33

V100

Performance

ANIMATION_CROPPING_MODE_FACEBOX

31.08

Performance

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

71.17

Quality

ANIMATION_CROPPING_MODE_FACEBOX

39.85

Quality

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

78.19

RTX 5090

Performance

ANIMATION_CROPPING_MODE_FACEBOX

25.36

Performance

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

51.45

Quality

ANIMATION_CROPPING_MODE_FACEBOX

27.83

Quality

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

60.27

RTX 4090

Performance

ANIMATION_CROPPING_MODE_FACEBOX

22.80

Performance

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

51.09

Quality

ANIMATION_CROPPING_MODE_FACEBOX

24.79

Quality

ANIMATION_CROPPING_MODE_REGISTRATION_BLENDING

50.61