Advanced Usage#

Model Caching#

When the container starts for the first time, it downloads the required models from NGC. To avoid downloading the models on subsequent runs, you can cache them locally by using a cache directory:

# Create the cache directory on the host machine
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
chmod 777 $LOCAL_NIM_CACHE

# Choose manifest profile id based on target architecture.
export MANIFEST_PROFILE_ID=<enter_valid_manifest_profile_id>

# Run the container with the cache directory mounted in the appropriate location
docker run -it --rm --name=maxine-eye-contact-nim \
  --runtime=nvidia \
  --gpus all \
  --shm-size=8GB \
  -e NGC_API_KEY=$NGC_API_KEY \
  -e NIM_MANIFEST_PROFILE=$MANIFEST_PROFILE_ID \
  -e NIM_HTTP_API_PORT=8000 \
  -p 8000:8000 \
  -p 8001:8001 \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  nvcr.io/nim/nvidia/maxine-eye-contact:latest

For more information about MANIFEST_PROFILE_ID, refer to the NIM Model Profile Table.

SSL Enablement#

Eye-Contact NIM provides an SSL mode to ensure secure communication between clients and the server by encrypting data in transit. To enable SSL, you need to provide the path to the SSL certificate and key files in the container. The following example shows how to do this:

export NGC_API_KEY=<add-your-api-key>
SSL_CERT=path/to/ssl_key

docker run -it --rm --name=maxine-eye-contact-nim \
  --runtime=nvidia \
  --gpus all \
  --shm-size=8GB \
  -v $SSL_CERT:/opt/nim/crt/:ro \
  -e NGC_API_KEY=$NGC_API_KEY \
  -p 8000:8000 \
  -p 8001:8001\
  -e NIM_SSL_MODE="mtls" \
  -e NIM_SSL_CERT_PATH="/opt/nim/crt/ssl_cert_server.pem" \
  -e NIM_SSL_KEY_PATH="/opt/nim/crt/ssl_key_server.pem" \
  nvcr.io/nim/nvidia/maxine-eye-contact:latest

NIM_SSL_MODE can be set to “mtls”, “tls”, or “disabled”. If set to “mtls”, the container uses mutual TLS authentication. If set to “tls”, the container uses TLS authentication. For more information, refer to NIM SSL Configuration.

Be sure to verify the permissions of the SSL certificate and key files on the host machine. The container will not be able to access the files if they are not readable by the user running the container.

Multiple Concurrent Inputs#

To run the server in multi-input concurrent mode, set the environment variable MAXINE_MAX_CONCURRENCY_PER_GPU to an integer greater than 1 in the server container. The server will then accept as many concurrent inputs per GPU as specified by the MAXINE_MAX_CONCURRENCY_PER_GPU variable.

Since Triton distributes the workload equally across all GPUs, if there are NUM_GPUS GPUs, the total number of concurrent inputs supported by the server will be NUM_GPUS * MAXINE_MAX_CONCURRENCY_PER_GPU.

The Eye Contact NIM uses NVENC/NVDEC hardware acceleration for video encoding and decoding.

  • GPUs without NVENC/NVDEC hardware support are not supported, including A100, H100, and B100 products.

  • Some GPUs support only a limited number of concurrent NVENC sessions, which means NIM can only process that same number of concurrent inputs on those GPUs.

  • Some GPUs support only certain YUV formats in h264 reading and writing.

For details, refer to the Video Encode and Decode GPU Support Matrix.

Note

If incoming requests to the NIM exceed the GPU’s maximum concurrent encode/decode limit, the processing fails.

Feature Parameters#

The following parameters are available to configure the functionality of the NIM.

Encoding Options#

  • lossless - Enables lossless video encoding. This setting overrides any bitrate configuration to ensure maximum quality output, though it results in larger file sizes.

python eye-contact.py --lossless
  • bitrate - Sets the target bitrate for video encoding in bits per second (bps). Higher bitrates result in better video quality but larger file sizes. The default is 3,000,000 bps (3 Mbps). For example, setting –bitrate 5000000 would target 5 Mbps encoding.

python eye-contact.py --bitrate 5000000
  • idr-interval - Sets the interval between IDR (Instantaneous Decoding Refresh) frames in the encoded video. IDR frames are special I-frames that clear all reference buffers, allowing the video to be decoded from that point without needing previous frames. A lower interval (e.g. 8 frames) provides better seeking/editing capabilities but increases file size, while a higher interval (e.g. 30 frames) reduces file size but may make seeking less precise. The default is 8 frames.

python eye-contact.py --idr-interval 10
  • Custom encoding - Allows passing custom encoding parameters as a JSON string. This provides fine-grained control over the encoding process.

python eye-contact.py --custom-encoding-params '{"idrinterval": "20", "maxbitrate": "3000000"}'

Note

Custom encoding parameters are for expert users who need fine-grained control over video encoding. Incorrect values may cause encoding failures or poor quality output. To configure the NVENC encoder please refer to Properties of the DeepStream H264 encoder plugin.

Arguments to Control Feature Behavior#

The following arguments affect the overall behavior of the feature, such as enabling or disabling temporal filtering or gaze redirection.

  • temporal - (UINT32) Flag to control temporal filtering (default 0xffffffff). When set to true, the landmark computation for eye contact is temporally optimized.

  • detect_closure - (UINT32) Flag to toggle detection of eye closure and occlusion. If turned off, blink and occlusion detection turns off. This might be desirable during estimation-only mode if you still want to obtain gaze estimation in case of occlusion. Not recommended for gaze redirection. Value is either 0 or 1 (default 0).

  • eye_size_sensitivity - (UINT32) Eye size sensitivity parameter that modifies the blending parameters to use a larger region around the eyes for blending. Integer value from 2 to 6 (default 3).

Randomized Look Away Parameters#

A continuous redirection of gaze to look at the camera might give a perception of staring. Some users might find this effect unnatural or undesired. To occasionally break eye contact, you can enable randomized look away in gaze redirection. Although the gaze is always expected to redirect toward the camera within the range of operation, enabling look away makes the user occasionally break gaze lock to the camera with a micro-movement of the eyes at randomly chosen time intervals. The enable_look_away parameter must be set to true to enable this feature. Additionally, you can use the optional parameters look_away_offset_max, look_away_interval_min, and look_away_interval_range to tune the extent and frequency of look away.

  • enable_lookaway - (UINT32) Flag to toggle look away. If set to on, the eyes are redirected to look away for a random period occasionally to avoid staring. Value is either 0 or 1 (default 0).

  • lookaway_max_offset - (UINT32) Maximum value of gaze offset angle (degrees) during a random look away when look away is enabled. Requires --enable_look_away parameter to be set to true. Integer value from 1 to 10 (default 5).

  • lookaway_interval_min - (UINT32) Minimum limit for the number of frames at which random look away occurs when look away is enabled. Requires --enable_look_away parameter to be set to true. Integer value from 1 to 600 (default 3).

  • lookaway_interval_range - (UINT32) Range for picking the number of frames at which random look away occurs when look away is enabled. Requires --enable_look_away parameter to be set to true. Integer value from 1 to 600 (default 8).

Range Control#

The gaze redirection feature redirects the eyes to look at the camera within a certain range of head and eye motion in which eye contact is desired and looks natural. Beyond this range, the feature gradually transitions away from looking at the camera toward the estimated gaze and eventually turns off in a seamless manner. To provide for various use cases and user preferences, we provide range parameters for the user to control the range of gaze angles and head poses in which gaze redirection occurs and the range in which transition occurs before the redirection is turned off. These are optional parameters.

gaze_pitch_threshold_low and gaze_yaw_threshold_low define the parameters for the pitch and yaw angles of the estimated gaze within which gaze is redirected toward the camera. Beyond these angles, redirected gaze transitions away from the camera and toward the estimated gaze, turning off redirection beyond gaze_pitch_threshold_high and gaze_yaw_threshold_high respectively.

Similarly, head_pitch_threshold_low and head_yaw_threshold_low define the parameters for pitch and yaw angles of the head pose within which gaze is redirected toward the camera. Beyond these angles, redirected gaze transitions away from the camera and toward the estimated gaze, turning off redirection beyond head_pitch_threshold_high and head_yaw_threshold_high.

  • gaze_pitch_threshold_low - (FP32) Gaze pitch threshold (degrees) at which the redirection starts transitioning away from camera toward estimated gaze. Float value from 10 to 35 (default 25).

  • gaze_pitch_threshold_high - (FP32) Gaze pitch threshold (degrees) at which the redirection is equal to estimated gaze and the gaze redirection is turned off beyond this angle. Float value from 10 to 35 (default 30).

  • gaze_yaw_threshold_low - (FP32) Gaze yaw threshold (degrees) at which the redirection starts transitioning away from camera toward estimated gaze. Float value from 10 to 35 (default 20).

  • gaze_yaw_threshold_high - (FP32) Gaze yaw threshold (degrees) at which the redirection the redirection is equal to estimated gaze and the gaze redirection is turned off beyond this angle. Float value from 10 to 35 (default 30).

  • head_pitch_threshold_low - (FP32) Head pose pitch threshold (degrees) of the estimated head pose at which redirection starts transitioning away from camera and toward the estimated gaze. Float value from 10 to 35 (default 20).

  • head_pitch_threshold_high - (FP32) Head pose pitch threshold (degrees) of the estimated head pose at which redirection equals the estimated gaze and redirection is turned off beyond this angle. Float value from 10 to 35 (default 25).

  • head_yaw_threshold_low - (FP32) Head pose yaw threshold (degrees) at which the redirection starts transitioning away from camera toward estimated gaze. Float value from 10 to 35 (default 25).

  • head_yaw_threshold_high - (FP32) Head pose yaw threshold (degrees) of the estimated head pose at which redirection equals the estimated gaze and redirection is turned off beyond this angle. Float value from 10 to 35 (default 30).

For more details on the technical aspects of this algorithm, see the technical blog post Improve Human Connection in Video Conferences with NVIDIA Maxine Eye Contact.