Basic Inference#

  1. Perform a health check on the gRPC endpoint.

    • Install grpcurl from github.com/fullstorydev/grpcurl/releases.

      Example commands to run on Ubuntu:

      wget https://github.com/fullstorydev/grpcurl/releases/download/v1.9.1/grpcurl_1.9.1_linux_amd64.deb
      sudo dpkg -i grpcurl_1.9.1_linux_amd64.deb
      
    • Download the health checking proto:

      wget https://raw.githubusercontent.com/grpc/grpc/master/src/proto/grpc/health/v1/health.proto
      
    • Run the health check:

      grpcurl --plaintext --proto health.proto localhost:8001 grpc.health.v1.Health/Check
      

      If the service is ready, you get a response similar to the following:

      { "status": "SERVING" }
      

    Note

    For using grpcurl with an SSL enabled server, avoid using the --plaintext argument, and use --cacert with a CA certificate, --key with a private key, or --cert with a certificate file. For more details, refer to grpcurl --help.

  2. Download the Eye Contact Python client code by cloning the Clients repository (NVIDIA-Maxine/nim-clients):

    git clone https://github.com/NVIDIA-Maxine/nim-clients.git
    
    # Go to the 'eye-contact' folder
    
    cd nim-clients/eye-contact/ 
    
  3. Install the required dependencies:

    sudo apt-get install python3-pip
    pip install -r requirements.txt
    

Compile the Protos (Optional)#

If you want to use the client code provided in the Clients repository (NVIDIA-Maxine/nim-clients), you can skip this step.

The proto files are available in the eye-contact/protos folder. You can compile them to generate client interfaces in your preferred programming language. For more details, refer to Supported languages in the gRPC documentation.

The following is an example of how to compile the protos for Python on Linux and Windows. The grpcio version needed for compilation can be referred from requirements.txt

To compile protos on Linux:

# Go to eye-contact/protos/linux folder
cd eye-contact/protos/linux

chmod +x compile_protos.sh
./compile_protos.sh

To compile protos on Windows:

# Go to eye-contact/protos/windows folder
cd eye-contact/protos/windows

./compile_protos.bat

Input and Output#

The input and output of the NVIDIA Eye Contact NIM are MP4 files. The input file must use the H.264 codec for video and can include audio. The output video file also uses the H.264 codec for video and contains the same audio as the input video file.

Note

Videos with Variable Frame Rate (VFR) are not supported.

Input Modes#

The NVIDIA Eye Contact NIM provides two modes for processing input files: streaming and transactional.

Aspect

Streaming Mode

Transactional Mode

Data Storage

Only frames being processed are temporarily copied in memory.

Entire video and audio files are temporarily copied on disk.

Processing Start

NIM starts processing as soon as data chunk for first frame arrives.

NIM waits to receive entire files before starting.

Processing Timing

Continuous processing without waiting for complete input.

Processing begins after all data is received.

Output Delivery

Output frames are generated and returned immediately.

Complete output video is returned to client after inference is finished for whole video.

Streaming Mode#

Streaming mode is the recommended way to use the NVIDIA Eye Contact NIM. It allows inference to begin without receiving the whole video from the client. It processes video frames incrementally, and inference begins as soon as the first frame of information is available. The output frames are streamed back to the client immediately after inference. This mode delivers the lowest latency and best resource efficiency, and it scales well to large files.

The Eye Contact NIM server automatically detects streamable videos and uses the streaming inference path internally. You do not need to enable any server-side setting or pass any flag to the NIM to get streaming behavior. Simply provide a streamable video as input.

Use streaming mode for these use cases:

  • Best overall performance. The NIM is optimized for this path.

  • Streamable video inputs.

  • Applications that benefit from receiving output as it is generated, without waiting for the entire file to be uploaded.

  • Large video files that benefit from incremental processing and reduced disk I/O.

Streaming mode works with streamable videos in which metadata is positioned at the beginning of the file. Videos that are not streamable can be easily converted to a streamable format.

To make any video streamable, use FFmpeg with the following command:

ffmpeg -i sample_video.mp4 -movflags +faststart sample_streamable.mp4

To run the NVIDIA Eye Contact NIM in streaming mode, go to the scripts directory and pass the streamable video as input:

python eye-contact.py --target <server_ip:port> \
   --input ../assets/sample_streamable.mp4 \
   --output <output file path along with file name>

Note

The --streaming flag in the sample client is optional and does not change the inference behavior on the server. It only enables streaming-mode logging on the client side. The server automatically detects streamable inputs and uses the streaming inference path internally regardless of whether the flag is passed.

Transactional Mode#

In transactional mode, the NIM receives the entire video file before processing can begin. The server automatically falls back to this mode when the input video is not streamable.

Transactional mode applies to the following use cases:

  • Videos that are not optimized for streaming (such as non-streamable MP4 files in which metadata is located at the end of the file, requiring the entire file to be downloaded before playback can begin).

  • Processing of small video files for which streaming overhead is unnecessary.

  • Applications that can wait for complete processing before receiving output.

To run the NVIDIA Eye Contact NIM with a non-streamable input, pass the file as input. The server detects that the video is not streamable and processes it in transactional mode automatically:

python eye-contact.py --target <server_ip:port> \
   --input <non-streamable input file path> \
   --output <output file path along with file name>

Tip

For best performance, convert your videos to a streamable format and use streaming mode. See the FFmpeg command earlier on this page.

Note

The first inference is not indicative of the model’s actual performance because it includes the time taken by the Triton Inference Server to load the models in addition to the time required to process the inference request.

Usage for Preview API Request#

   python eye-contact.py --preview-mode \
      --target grpc.nvcf.nvidia.com:443 \
      --function-id <function_id> \
      --api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC \
      --input <input_file_path> \
      --output <output_file_path>

To view details of command-line arguments, run this command:

python eye-contact.py -h

You get a response similar to the following. All parameters are optional.

options:
  -h, --help                                             show this help message and exit
  --preview-mode                                         Flag to send request to preview NVCF NIM server on https://build.nvidia.com/nvidia/eyecontact/api. (default: False)
  --ssl-mode {DISABLED,MTLS,TLS}                         Flag to set SSL mode, default is DISABLED (default: DISABLED)
  --ssl-key SSL_KEY                                      The path to ssl private key. (default: ../ssl_key/ssl_key_client.pem)
  --ssl-cert SSL_CERT                                    The path to ssl certificate chain. (default: ../ssl_key/ssl_cert_client.pem)
  --ssl-root-cert SSL_ROOT_CERT                          The path to ssl root certificate. (default: ../ssl_key/ssl_ca_cert.pem)
  --target TARGET                                        IP:port of gRPC service, when hosted locally. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF. (default: 127.0.0.1:8001)
  --api-key API_KEY                                      NGC API key required for authentication, utilized when using TRY API ignored otherwise (default: None)
  --function-id FUNCTION_ID                              NVCF function ID for the service, utilized when using TRY API ignored otherwise (default: None)
  --input INPUT                                          The path to the input video file. (default: ../assets/sample_transactional.mp4)
  --output OUTPUT                                        The path for the output video file. (default: output.mp4)
  --streaming                                            Optional flag to enable streaming-mode logging on the client side. The server automatically uses the streaming inference path for streamable inputs regardless of this flag. (default: False)
  --bitrate BITRATE                                      Output video bitrate in bps (default: 20000000). This is only applicable when lossless mode is disabled. (default: 20000000)
  --idr-interval IDR_INTERVAL                            The interval for IDR frames in the output video. This is only applicable when lossless mode is disabled. (default: 8) (default: 8)
  --lossless                                             Flag to enable lossless mode for video encoding. (default: False)
  --custom-encoding-params CUSTOM_ENCODING_PARAMS        Custom encoding parameters in JSON format. (default: None)
  --temporal TEMPORAL                                    Flag to control temporal filtering (default: 4294967295) (default: 4294967295)
  --detect-closure DETECT_CLOSURE                        Flag to toggle detection of eye closure and occlusion on/off (default: 0) (default: 0)
  --eye-size-sensitivity EYE_SIZE_SENSITIVITY            Eye size sensitivity parameter (default: 3, range: [2, 6]) (default: 3)
  --enable-lookaway {0,1}                                Flag to toggle look away on/off (default: 0) (default: 0)
  --lookaway-max-offset LOOKAWAY_MAX_OFFSET              Maximum value of gaze offset angle (degrees) during a random look away (default: 5, range: [1, 10]) (default: 5)
  --lookaway-interval-min LOOKAWAY_INTERVAL_MIN          Minimum limit for the number of frames at which random look away occurs (default: 3, range: [1, 600]) (default: 3)
  --lookaway-interval-range LOOKAWAY_INTERVAL_RANGE      Range for picking the number of frames at which random look away occurs (default: 8, range: [1, 600]) (default: 8)
  --gaze-pitch-threshold-low GAZE_PITCH_THRESHOLD_LOW    Gaze pitch threshold (degrees) at which the redirection starts transitioning (default: 20.0, range: [10, 35]) (default: 20.0)
  --gaze-pitch-threshold-high GAZE_PITCH_THRESHOLD_HIGH  Gaze pitch threshold (degrees) at which the redirection is equal to estimated gaze (default: 30.0, range: [10, 35]) (default: 30.0)
  --gaze-yaw-threshold-low GAZE_YAW_THRESHOLD_LOW        Gaze yaw threshold (degrees) at which the redirection starts transitioning (default: 20.0, range: [10, 35]) (default: 20.0)
  --gaze-yaw-threshold-high GAZE_YAW_THRESHOLD_HIGH      Gaze yaw threshold (degrees) at which the redirection is equal to estimated gaze (default: 30.0, range: [10, 35]) (default: 30.0)
  --head-pitch-threshold-low HEAD_PITCH_THRESHOLD_LOW    Head pose pitch threshold (degrees) at which the redirection starts transitioning away from camera towards estimated gaze (default: 15.0, range: [10, 35]) (default: 15.0)
  --head-pitch-threshold-high HEAD_PITCH_THRESHOLD_HIGH  Head pose pitch threshold (degrees) at which the redirection is equal to estimated gaze (default: 25.0, range: [10, 35]) (default: 25.0)
  --head-yaw-threshold-low HEAD_YAW_THRESHOLD_LOW        Head pose yaw threshold (degrees) at which the redirection starts transitioning (default: 25.0, range: [10, 35]) (default: 25.0)
  --head-yaw-threshold-high HEAD_YAW_THRESHOLD_HIGH      Head pose yaw threshold (degrees) at which the redirection is equal to estimated gaze (default: 30.0, range: [10, 35]) (default: 30.0)

Example Commands#

Recommended: Provide a streamable video as input. The server automatically uses the streaming inference path.

python eye-contact.py --target 127.0.0.1:8001 --input ../assets/sample_streamable.mp4 --output output.mp4

Optional: Specify --streaming to enable streaming-mode logging on the client side. This does not change server behavior.

python eye-contact.py --target 127.0.0.1:8001 --input ../assets/sample_streamable.mp4 --output output.mp4 --streaming

Non-streamable input. The server automatically falls back to transactional mode.

python eye-contact.py --target 127.0.0.1:8001 --input ../assets/sample_transactional.mp4 --output output.mp4

Advanced configuration example:

python eye-contact.py --target 127.0.0.1:8001 --input ../assets/sample_streamable.mp4 --output output.mp4 --custom-encoding-params '{"control-rate": "1", "bitrate": 3000000, "tuning-info-id": "3", "temporalaq": true}'

Python Notebook#

For an interactive experience and to explore all feature parameters, we provide a comprehensive Python notebook that demonstrates the Eye Contact service capabilities.

The Python notebook is located at examples/maxine-eye-contact.ipynb in the eye-contact client folder.