Basic Inference#

  1. Perform a health check on the gRPC endpoint.

    • Install grpcurl from github.com/fullstorydev/grpcurl/releases.

      Example commands to run on Ubuntu:

      wget https://github.com/fullstorydev/grpcurl/releases/download/v1.9.1/grpcurl_1.9.1_linux_amd64.deb
      sudo dpkg -i grpcurl_1.9.1_linux_amd64.deb
      
    • Download the health checking proto:

      wget https://raw.githubusercontent.com/grpc/grpc/master/src/proto/grpc/health/v1/health.proto
      
    • Run the health check:

      grpcurl --plaintext --proto health.proto localhost:8001 grpc.health.v1.Health/Check
      

      If the service is ready, you get a response similar to the following:

      { "status": "SERVING" }
      

    Note

    For using grpcurl with an SSL enabled server, avoid using the --plaintext argument, and use --cacert with a CA certificate, --key with a private key, or --cert with a certificate file. For more details, refer to grpcurl --help.

  2. Download the Eye Contact Python client code by cloning the Clients repository (NVIDIA-Maxine/nim-clients):

    git clone https://github.com/NVIDIA-Maxine/nim-clients.git
    
    # Go to the 'eye-contact' folder
    
    cd nim-clients/eye-contact/ 
    
  3. Install the required dependencies:

    sudo apt-get install python3-pip
    pip install -r requirements.txt
    

Compile the Protos (Optional)#

If you want to use the client code provided in the Clients repository (NVIDIA-Maxine/nim-clients), you can skip this step.

The proto files are available in the eye-contact/protos folder. You can compile them to generate client interfaces in your preferred programming language. For more details, refer to Supported languages in the gRPC documentation.

The following is an example of how to compile the protos for Python on Linux and Windows. The grpcio version needed for compilation can be referred from requirements.txt

To compile protos on Linux:

# Go to eye-contact/protos/linux folder
cd eye-contact/protos/linux

chmod +x compile_protos.sh
./compile_protos.sh

To compile protos on Windows:

# Go to eye-contact/protos/windows folder
cd eye-contact/protos/windows

./compile_protos.bat

Input and Output#

The input and output of the NVIDIA Eye Contact NIM are mp4 files. The input file must use the H.264 codec for video and can include audio. The output video file also uses the H.264 codec for video and contains the same audio as the input video file.

Note

Videos with Variable Frame Rate (VFR) are not supported.

Input Modes#

The NVIDIA Eye Contact NIM supports two distinct input processing modes based on the input file capabilities:

Transactional Mode (Default)#

In transactional mode, the entire input video file must be received and processed as a complete unit by the NIM before returning results.

This mode is suitable for the following use cases:

  • Processing of small video files, as these files are copied to NIM in entirety before inference can begin.

  • Applications that can wait for complete processing before receiving output.

  • Videos that are not optimized for streaming.

To run the NVIDIA Eye Contact NIM in transactional mode, simply run the NVIDIA Eye Contact sample client without any additional flags:

Go to the scripts directory:

cd scripts

Send a gRPC request:

python eye-contact.py --target <server_ip:port> \
   --input <input file path> \
   --output <output file path along with file name> 

Streaming Mode#

In streaming mode, the NIM can process incoming video frames incrementally as soon as frame information is available. The output frames are streamed back to the client immediately after processing.

This is the preferred mode for the following use cases:

  • Streamable video inputs.

  • Applications that do not want to wait until the entire file is completely uploaded.

  • Large video files that can benefit from incremental processing and thereby not constrained by the disk space on the server.

Aspect

Transactional Mode

Streaming Mode

Data Storage

Entire video and audio files are temporarily copied on disk

Only frames being processed are temporarily copied in memory

Processing Start

NIM waits to receive entire files before starting

NIM starts processing as soon as data chunk for first frame arrives

Processing Timing

Processing begins after all data is received

Continuous processing without waiting for complete input

Output Delivery

Complete output video is returned to client after inference is finished for whole video

Output frames are generated and returned immediately

The --streaming option works with streamable videos where metadata is positioned at the beginning of the file. Videos that are not streamable can be easily converted to a streamable format.

To make any video streamable, use FFmpeg with the following command:

   ffmpeg -i sample_video.mp4 -movflags +faststart sample_streamable.mp4

You can then specify the streamable video as input to the NIM by using the --input parameter and enabling the --streaming option.

   python eye-contact.py --target <server_ip:port> \
      --input <input file path> \
      --output <output file path along with file name> \
      --streaming 

Note

The first inference is not indicative of the model’s actual performance because it includes the time taken by the Triton Inference Server to load the models in addition to the time required to process the inference request.

Usage for Preview API Request#

   python eye-contact.py --preview-mode \
      --target grpc.nvcf.nvidia.com:443 \
      --function-id <function_id> \
      --api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC \
      --input <input_file_path> \
      --output <output_file_path>

To view details of command-line arguments, run this command:

python eye-contact.py -h

You get a response similar to the following. All parameters are optional.

options:
  -h, --help                                             show this help message and exit
  --preview-mode                                         Flag to send request to preview NVCF NIM server on https://build.nvidia.com/nvidia/eyecontact/api. (default: False)
  --ssl-mode {DISABLED,MTLS,TLS}                         Flag to set SSL mode, default is DISABLED (default: DISABLED)
  --ssl-key SSL_KEY                                      The path to ssl private key. (default: ../ssl_key/ssl_key_client.pem)
  --ssl-cert SSL_CERT                                    The path to ssl certificate chain. (default: ../ssl_key/ssl_cert_client.pem)
  --ssl-root-cert SSL_ROOT_CERT                          The path to ssl root certificate. (default: ../ssl_key/ssl_ca_cert.pem)
  --target TARGET                                        IP:port of gRPC service, when hosted locally. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF. (default: 127.0.0.1:8001)
  --api-key API_KEY                                      NGC API key required for authentication, utilized when using TRY API ignored otherwise (default: None)
  --function-id FUNCTION_ID                              NVCF function ID for the service, utilized when using TRY API ignored otherwise (default: None)
  --input INPUT                                          The path to the input video file. (default: ../assets/sample_transactional.mp4)
  --output OUTPUT                                        The path for the output video file. (default: output.mp4)
  --streaming                                            Flag to enable grpc streaming mode. Required for streamable video input. (default: False)
  --bitrate BITRATE                                      Output video bitrate in bps (default: 3000000). This is only applicable when lossless mode is disabled. (default: 3000000)
  --idr-interval IDR_INTERVAL                            The interval for IDR frames in the output video. This is only applicable when lossless mode is disabled. (default: 8) (default: 8)
  --lossless                                             Flag to enable lossless mode for video encoding. (default: False)
  --custom-encoding-params CUSTOM_ENCODING_PARAMS        Custom encoding parameters in JSON format. (default: None)
  --temporal TEMPORAL                                    Flag to control temporal filtering (default: 4294967295) (default: 4294967295)
  --detect-closure DETECT_CLOSURE                        Flag to toggle detection of eye closure and occlusion on/off (default: 0) (default: 0)
  --eye-size-sensitivity EYE_SIZE_SENSITIVITY            Eye size sensitivity parameter (default: 3, range: [2, 6]) (default: 3)
  --enable-lookaway {0,1}                                Flag to toggle look away on/off (default: 0) (default: 0)
  --lookaway-max-offset LOOKAWAY_MAX_OFFSET              Maximum value of gaze offset angle (degrees) during a random look away (default: 5, range: [1, 10]) (default: 5)
  --lookaway-interval-min LOOKAWAY_INTERVAL_MIN          Minimum limit for the number of frames at which random look away occurs (default: 100, range: [1, 600]) (default: 100)
  --lookaway-interval-range LOOKAWAY_INTERVAL_RANGE      Range for picking the number of frames at which random look away occurs (default: 250, range: [1, 600]) (default: 250)
  --gaze-pitch-threshold-low GAZE_PITCH_THRESHOLD_LOW    Gaze pitch threshold (degrees) at which the redirection starts transitioning (default: 20.0, range: [10, 35]) (default: 20.0)
  --gaze-pitch-threshold-high GAZE_PITCH_THRESHOLD_HIGH  Gaze pitch threshold (degrees) at which the redirection is equal to estimated gaze (default: 30.0, range: [10, 35]) (default: 30.0)
  --gaze-yaw-threshold-low GAZE_YAW_THRESHOLD_LOW        Gaze yaw threshold (degrees) at which the redirection starts transitioning (default: 20.0, range: [10, 35]) (default: 20.0)
  --gaze-yaw-threshold-high GAZE_YAW_THRESHOLD_HIGH      Gaze yaw threshold (degrees) at which the redirection is equal to estimated gaze (default: 30.0, range: [10, 35]) (default: 30.0)
  --head-pitch-threshold-low HEAD_PITCH_THRESHOLD_LOW    Head pose pitch threshold (degrees) at which the redirection starts transitioning away from camera towards estimated gaze (default: 15.0, range: [10, 35]) (default: 15.0)
  --head-pitch-threshold-high HEAD_PITCH_THRESHOLD_HIGH  Head pose pitch threshold (degrees) at which the redirection is equal to estimated gaze (default: 15.0, range: [10, 35]) (default: 15.0)
  --head-yaw-threshold-low HEAD_YAW_THRESHOLD_LOW        Head pose yaw threshold (degrees) at which the redirection starts transitioning (default: 15.0, range: [10, 35]) (default: 15.0)
  --head-yaw-threshold-high HEAD_YAW_THRESHOLD_HIGH      Head pose yaw threshold (degrees) at which the redirection is equal to estimated gaze (default: 15.0, range: [10, 35]) (default: 15.0)

Example Commands#

Basic inference with default settings:

python eye-contact.py --target 127.0.0.1:8001 --input ../assets/sample_transactional.mp4 --output output.mp4

Using streaming mode for streamable videos:

python eye-contact.py --target 127.0.0.1:8001 --input ../assets/sample_streamable.mp4 --output output.mp4 --streaming

Advanced configuration example:

python eye-contact.py --target 127.0.0.1:8001 --input ../assets/sample_transactional.mp4 --output output.mp4     --custom-encoding-params '{"control-rate": "1", "bitrate": 3000000, "tuning-info-id": "3", "temporalaq": true}'

Python Notebook#

For an interactive experience and to explore all feature parameters, we provide a comprehensive Python notebook that demonstrates the Eye Contact service capabilities.

The Python notebook is located at examples/maxine-eye-contact.ipynb in the eye-contact client folder.