Properties for the AR SDK Features#

This section provides the properties and their values for the features in the AR SDK.

Face Tracking Property Values#

The following tables list the values for the configuration, input, and output properties for face tracking.

Table 3‑2: Configuration Properties for Face Tracking

Property Name

Value

FeatureDescription

String is free-form text that describes the feature.

The string is set by the SDK and cannot be modified by the user.

CUDAStream

The CUDA stream, which is set by the user.

ModelDir

String that contains the path to the folder that contains the TensorRT package files.

Set by the user.

Temporal

Unsigned integer to enable (1) or disable (0) the temporal optimization of face detection. If enabled, only one face is returned. For more information, refer to Face Detection and Tracking.

Set by the user.

Table 3‑3: Input Properties for Face Tracking

Property Name

Value

Image

Interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

To be allocated and set by the user.

Table 3‑4: Output Properties for Face Tracking

Property Name

Value

BoundingBoxes

NvAR_BBoxes structure that holds the detected face boxes.

To be allocated by the user.

BoundingBoxesConfidence

Optional: An array of single-precision (32-bit) floating-point numbers that contain the confidence values for each detected face box.

To be allocated by the user.

Landmark Tracking Property Values#

The following tables list the values for the configuration, input, and output properties for landmark tracking.

Table 3‑5: Configuration Properties for Landmark Tracking

Property Name

Value

FeatureDescription

String that describes the feature.

CUDAStream

The CUDA stream.

Set by the user.

ModelDir

String that contains the path to the folder that contains the TensorRT package files.

Set by the user.

BatchSize

The number of inferences to be run at one time on the GPU.
The maximum value is 8.

Temporal optimization of landmark detection is supported only for BatchSize=1.

Landmarks_Size

Unsigned integer, 68 or 126.

Specifies the number of landmark points (x and y values) to be returned.

Set by the user.

LandmarksConfidence_Size

Unsigned integer, 68 or 126.

Specifies the number of landmark confidence values for the detected keypoints to be returned.

Set by the user.

Temporal

Unsigned integer to enable (1) or disable (0) the temporal optimization of landmark detection. If enabled, only one input bounding box is supported as the input. For more information, refer to Face Detection and Tracking.

Set by the user.

Mode

Optional: Unsigned integer. Set 0 to enable Performance mode (default) or 1 to enable Quality mode for landmark detection.

Set by the user.

Table 3‑6: Input Properties for Landmark Tracking

Property Name

Value

Image

Interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

To be allocated and set by the user.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the number of bounding boxes that are equal to BatchSize on which to run landmark detection.

If not specified as an input property, face detection is automatically run on the input image. For more information, refer to Face Detection and Tracking.

To be allocated by the user.

Table 3‑7: Output Properties for Landmark Tracking

Property Name

Value

Landmarks

NvAR_Point2f array, which must be large enough to hold the number of points given by the product of NvAR_Parameter_Config(BatchSize) and NvAR_Parameter_Config(Landmarks_Size).

To be allocated by the user.

Pose

Optional: NvAR_Quaternion array, which must be large enough to hold the number of quaternions equal to NvAR_Parameter_Config(BatchSize).

The OpenGL standards coordinate convention is used: When you look up from a camera, the coordinates are x (camera right), y (camera up), and z (toward camera).

To be allocated by the user.

LandmarksConfidence

Optional: An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values given by the product of the following:

  • NvAR_Parameter_Config(BatchSize)

  • NvAR_Parameter_Config(LandmarksConfidence_Size)

To be allocated by the user.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the detected face through face detection performed by the landmark detection feature. For more information, refer to Face Detection and Tracking.

To be allocated by the user.

Face 3D Mesh Tracking Property Values#

The following tables list the values for the configuration, input, and output properties for Face 3D Mesh tracking.

Table 3‑8: Configuration Properties for Face 3D Mesh Tracking

Property Name

Value

FeatureDescription

String that describes the feature.

This property is read-only.

ModelDir

String that contains the path to the face model and the TensorRT package files. For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

Set by the user.

CUDAStream

Optional: The CUDA stream.

For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

Set by the user.

Temporal

Optional: Unsigned integer to enable (1) or disable (0) the temporal optimization of face and landmark detection. For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

Set by the user.

Mode

Optional: Unsigned integer. Set 0 to enable Performance mode (default) or 1 to enable Quality mode for landmark detection.

Set by the user.

Landmarks_Size

Unsigned integer, 68 or 126.

If landmark detection is run internally, the confidence values for the detected key points are returned. For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

ShapeEigenValueCount

The number of eigenvalues that describe the identity shape. Query this to determine how big the eigenvalue array should be, if that is a desired output.

This property is read-only.

ExpressionCount

The number of expressions available in the chosen model. Query this to determine how big the expression coefficient array should be, if that is the desired output.

This property is read-only.

VertexCount

The number of vertices in the chosen model.

Query this property to determine how big the vertex array should be, where VertexCount is the number of vertices, as is done with similar counts.

This property is read-only.

TriangleCount

The number of triangles in the chosen model.

Query this property to determine how big the triangle array should be, where TriangleCount is the number of triangles, as is done with similar counts.

This property is read-only.

GazeMode

Flag to toggle gaze mode.

The default value is 0. If the value is 1, gaze estimation is explicit.

Table 3‑9: Input Properties for Face 3D Mesh Tracking

Property Name

Value

Width

The width of the input image buffer that contains the face to which the face model will be fitted.

Set by the user.

Height

The height of the input image buffer that contains the face to which the face model will be fitted.

Set by the user.

Landmarks

Optional: An NvAR_Point2f array that contains the landmark points of size NvAR_Parameter_Config(Landmarks_Size) that is returned by the landmark detection feature.

If landmarks are not provided to this feature, an input image must be provided. For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

To be allocated by the user.

Image

Optional: An interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

If an input image is not provided as input, the landmark points must be provided to this feature as input. For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

To be allocated by the user.

Table 3‑10: Output Properties for Face 3D Mesh Tracking

Property Name

Value

FaceMesh

NvAR_FaceMesh structure that contains the output face mesh.

To be allocated by the user.

Query VertexCount and TriangleCount to determine how much memory to allocate.

RenderingParams

NvAR_RenderingParams structure that contains the rendering parameters for drawing the face mesh that is returned by this feature.

To be allocated by the user.

Landmarks

Optional: An NvAR_Point2f array, which must be large enough to hold the number of points of size NvAR_Parameter_Config(Landmarks_Size).

For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

To be allocated by the user.

Pose

Optional: NvAR_Quaternion array, which must be large enough to hold one quaternion. For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

The OpenGL standards coordinate convention is used: When you look up from a camera, the coordinates are x (camera right), y (camera up), and z (toward camera).

To be allocated by the user.

LandmarksConfidence

Optional: An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values of size NvAR_Parameter_Config(LandmarksConfidence_Size).

For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

To be allocated by the user.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the detected face that is determined internally. For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

To be allocated by the user.

BoundingBoxesConfidence

Optional: An array of single-precision (32-bit) floating-point numbers that contain the confidence values for each detected face box. For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

To be allocated by the user.

ShapeEigenValues

Optional: The array into which the shape eigenvalues will be placed, if desired. Query ShapeEigenValueCount to determine how big this array should be.

To be allocated by the user.

ExpressionCoefficients

Optional: The array into which the expression coefficients will be placed, if desired. Query ExpressionCount to determine how big this array should be.

To be allocated by the user.

The corresponding expression shapes for face_model2.nvf are in the following order:

BrowDown_L, BrowDown_R, BrowInnerUp_L, BrowInnerUp_R, BrowOuterUp_L, BrowOuterUp_R, CheekPuff_L, CheekPuff_R, CheekSquint_L, CheekSquint_R, EyeBlink_L, EyeBlink_R, EyeLookDown_L, EyeLookDown_R, EyeLookIn_L, EyeLookIn_R, EyeLookOut_L, EyeLookOut_R, EyeLookUp_L, EyeLookUp_R, EyeSquint_L, EyeSquint_R, EyeWide_L, EyeWide_R, JawForward, JawLeft, JawOpen, JawRight, MouthClose, MouthDimple_L, MouthDimple_R, MouthFrown_L, MouthFrown_R, MouthFunnel, MouthLeft, MouthLowerDown_L, MouthLowerDown_R, MouthPress_L, MouthPress_R, MouthPucker, MouthRight, MouthRollLower, MouthRollUpper, MouthShrugLower, MouthShrugUpper, MouthSmile_L, MouthSmile_R, MouthStretch_L, MouthStretch_R, MouthUpperUp_L, MouthUpperUp_R, NoseSneer_L, NoseSneer_R,

Eye Contact Property Values#

The following tables list the values for the configuration, input, and output properties for gaze redirection.

Table 3‑11: Configuration Properties for Eye Contact

Property Name

Value

FeatureDescription

String that describes the feature.

ModelDir

String that contains the path to the folder that contains the TensorRT package files.

Set by the user.

BatchSize

The number of inferences to be run at one time on the GPU. The maximum value is 1.

Landmarks_Size

Unsigned integer, either 68 or 126.

Specifies the number of landmark points (x and y values) to be returned.

Set by the user.

LandmarksConfidence_Size

Unsigned integer, either 68 or 126.

Specifies the number of landmark confidence values for the detected keypoints to be returned.

Set by the user.

GazeRedirect

Flag to enable or disable gaze redirection.

When enabled, the gaze is estimated, and the redirected image is set as the output. When disabled, the gaze is estimated but redirection does not occur.

Temporal

Unsigned integer to enable (1) or disable (0) the temporal optimization of landmark detection.

Set by the user.

DetectClosure

Flag to toggle the detection of eye closure and occlusion. The default value is On.

EyeSizeSensitivity

An unsigned integer in the range 2–5, inclusive, that is used to increase the sensitivity of the algorithm to the redirected eye size. A value of 2 uses a smaller eye region, and a value of 5 uses a larger eye size.

UseCudaGraph

Bool. Default is False.

Flag to use CUDA Graphs for optimization.

Set by the user.

EnableLookAway

Bool. Default is false.

Flag that, when set to true, redirects the eyes to look away at a random time for a random period. The eyes follow the relative changes in estimated gaze during the lookaway period.

Set by the user.

LookAwayOffsetMax

Unsigned int value in the range 0–10. Default is 5.

If the value is set to x degrees, a randomly chosen offset angle in the range −*x* to x in degrees will be added to the lookaway angle during the random lookaway period. The lookaway angle is based on the relative motion of the eyes in the input image during the lookaway period. It is not used outside the lookaway period.

Set by the user.

LookAwayIntervalMin

Unsigned int value in the range 1–600. Default is 100.

Minimum limit for the number of frames at which random look away occurs. This value is applicable only when EnableLookAway is set to true.

The value can be optionally set by the user.

LookAwayIntervalRange

Unsigned int value in the range 1–600. Default is 250.

Interval range for picking the number of frames at which random lookaway occurs. Adding this range to LookAwayIntervalMin provides the maximum limit for the number of frames at which random lookaway occurs. This value is applicable only when EnableLookAway is set to true.

The value can be optionally set by the user.

GazePitchThresholdLow

Float value in the range of 10.0–35.0 (degrees). Default is 20.0.

This is a range control parameter. It defines the threshold for estimated gaze angle in the pitch direction within which gaze is always redirected towards the camera. Beyond this angle, the redirected gaze transitions away from the camera and towards the estimated gaze angle.

This value is optionally set by the user.

GazeYawThresholdLow

Float value in the range of 10.0–35.0 (degrees). Default is 20.0.

This is a range control parameter. It defines the threshold for estimated gaze angle in the yaw direction within which gaze is always redirected towards the camera. Beyond this angle, the redirected gaze transitions away from the camera and towards the estimated gaze angle. This value is optionally set by the user.

HeadPitchThresholdLow

Float value in the range of 10.0–35.0 (degrees). Default is 15.0.

This is a range control parameter. It defines the threshold for estimated head pose angle in the pitch direction within which gaze is always redirected towards the camera. Beyond this angle, the redirected gaze transitions away from the camera and towards the estimated gaze angle.

This value is optionally set by the user.

HeadYawThresholdLow

Float value in the range of 10.0–35.0 (degrees). Default is 25.0.

This is a range control parameter. It defines the threshold for estimated head pose angle in the yaw direction within which gaze is always redirected towards the camera. Beyond this angle, the redirected gaze transitions away from the camera and towards the estimated gaze angle.

This value is optionally set by the user.

GazePitchThresholdHigh

Float value in the range of 10.0–35.0 (degrees). Default is 30.0.

This is a range control parameter. It defines the threshold for estimated gaze angle in the pitch direction beyond which no redirection occurs and the angle of redirected gaze is equal to the estimated gaze. The redirected gaze in the pitch direction increasingly moves away from the camera and towards the estimated gaze beyond GazePitchThresholdLow and reaches the estimated gaze value at GazePitchThresholdHigh. The value of this parameter is expected to be greater than GazePitchThresholdLow.

This value is optionally set by the user.

GazeYawThresholdHigh

Float value in the range of 10.0–35.0 (degrees). Default is 30.0.

This is a range control parameter. It defines the threshold for estimated gaze angle in the yaw direction beyond which no redirection occurs and the angle of redirected gaze is equal to the estimated gaze. The redirected gaze in the yaw direction increasingly moves away from the camera and towards the estimated gaze beyond GazeYawThresholdLow and reaches the estimated gaze value at GazeYawThresholdHigh. The value of this parameter is expected to be greater than GazeYawThresholdLow.

This value is optionally set by the user.

HeadPitchThresholdHigh

Float value in the range of 10.0-35.0 (degrees). Default value 25.0.

This is a range control parameter. It defines the threshold for estimated head pose angle in the pitch direction beyond which no redirection occurs and the angle of redirected gaze is equal to the estimated gaze. The redirected gaze in the pitch direction increasingly moves away from the camera and towards the estimated gaze beyond HeadPitchThresholdLow and reaches the estimated gaze value at HeadPitchThresholdHigh. The value of this parameter is expected to be greater than HeadPitchThresholdLow.

This value is optionally set by the user.

HeadYawThresholdHigh

Float value in the range of 10.0-35.0 (degrees). Default value 30.0.

This is a range control parameter. It defines the threshold for estimated head pose angle in the yaw direction beyond which no redirection occurs and the angle of redirected gaze is equal to the estimated gaze. The redirected gaze in the yaw direction increasingly moves away from the camera and towards the estimated gaze beyond HeadYawThresholdLow and reaches the estimated gaze value at HeadYawThresholdHigh. The value of this parameter is expected to be greater than HeadYawThresholdLow.

This value is optionally set by the user.

Table 3‑12: Input Properties for Eye Contact

Property Name

Value

Image

Interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

To be allocated and set by the user.

Width

The width of the input image buffer that contains the face to which the face model will be fitted.

Set by the user.

Height

The height of the input image buffer that contains the face to which the face model will be fitted.

Set by the user.

Landmarks

Optional: An NvAR_Point2f array that contains the landmark points of size NvAR_Parameter_Config(Landmarks_Size) that is returned by the landmark detection feature.

If landmarks are not provided to this feature, an input image must be provided.

For more information, refer to Alternative Usage of the Face 3D Mesh Feature.

To be allocated by the user.

Table 3‑13: Output Properties for Eye Contact

Property Name

Value

Landmarks

NvAR_Point2f array, which must be large enough to hold the number of points given by the product of the following:

  • NvAR_Parameter_Config(BatchSize)

  • NvAR_Parameter_Config(Landmarks_Size)

To be allocated by the user.

HeadPose

Optional: NvAR_Quaternion array, which must be large enough to hold the number of quaternions equal to NvAR_Parameter_Config(BatchSize).

The OpenGL standards coordinate convention is used: When you look up from a camera, the coordinates are x (camera right), y (camera up), and z (toward the camera).

To be allocated by the user.

LandmarksConfidence

Optional: An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values given by the product of the following:

  • NvAR_Parameter_Config(BatchSize)

  • NvAR_Parameter_Config(LandmarksConfidence_Size)

To be allocated by the user.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the detected face through face detection performed by the landmark detection feature. Refer to Alternative Usage of the Face 3D Mesh Feature for more information.

To be allocated by the user.

OutputGazeVector

Float array, which must be large enough to hold the two values (pitch and yaw) for the gaze angle in radians per image. For batch sizes larger than 1, it should hold NvAR_Parameter_Config(BatchSize) × 2 float values.

To be allocated by the user.

OutputHeadTranslation

Optional: Float array, which must be large enough to hold the head translations (x,y,z) per image. For batch sizes larger than 1, it should hold NvAR_Parameter_Config(BatchSize) × 3 float values.

To be allocated by the user.

GazeDirection

Optional: NvAR_Point3f array that is large enough to hold as many elements as NvAR_Parameter_Config(BatchSize).

Each element contains two NvAR_Point3f points. One point represents the center point (cx,cy,cz) between the eyes, and the other point represents a unit vector (ux, uy, uz) in the gaze direction for visualization. For batch sizes larger than 1, it should hold NvAR_Parameter_Config(BatchSize) × 2 NvAR_Point3f points.

To be allocated by the user.

Body Detection Property Values#

The following tables list the values for the configuration, input, and output properties for Body Detection racking.

Table 3‑14: Configuration Properties for Body Detection

Property Name

Value

FeatureDescription

String is free-form text that describes the feature.

The string is set by the SDK and cannot be modified by the user.

CUDAStream

The CUDA stream, which is set by the user.

ModelDir

String that contains the path to the folder that contains the TensorRT package files.

Set by the user.

Temporal

Unsigned integer to enable (1) or disable (0) the temporal optimization of body detection.

Set by the user.

FullBodyOnly

Unsigned integer to select the estimation mode:

  • 1: Full Body only (default).

  • 0: Full and upper body.

Set by the user.

Table 3‑15: Input Properties for Body Detection

Property Name

Value

Image

Interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

To be allocated and set by the user.

Table 3‑16: Output Properties for Body Detection

Property Name

Value

BoundingBoxes

NvAR_BBoxes structure that holds the detected body boxes.

To be allocated by the user.

BoundingBoxesConfidence

Optional: An array of single-precision (32-bit) floating-point numbers that contain the confidence values for each detected body box.

To be allocated by the user.

3D Body Pose Keypoint Tracking Property Values#

The following tables list the values for the configuration, input, and output properties for 3D Body Pose Keypoint Tracking racking.

Table 3‑17: Configuration Properties for 3D Body Pose Keypoint Tracking

Property Name

Value

FeatureDescription

String that describes the feature.

CUDAStream

The CUDA stream.

Set by the user.

ModelDir

String that contains the path to the folder that contains the TensorRT package files.

Set by the user.

BatchSize

The number of inferences to be run at one time on the GPU. The maximum value is 1.

Mode

Unsigned integer that specifies the mode: High Performance (1) or High Quality (0). Default is 1.

Set by the user.

UseCudaGraph

Boolean to enable (true) or disable (false) the use of CUDA Graphs for optimization.

Set by the user.

Temporal

Unsigned integer to enable (1) or disable (0) the temporal optimization of Body Pose tracking.

Set by the user.

NumKeyPoints

Unsigned integer that specifies the number of keypoints available, which is currently 34.

ReferencePose

NvAR_Point3f array that contains the reference pose for each of the keypoints.

Set by the user.

FullBodyOnly

Unsigned integer to select the pose estimation mode:

  • Full Body only (1). Supports both high quality and high performance modes.

  • Full and upper body (0). Supports only high quality mode.

The default is 1.

Set by the user.

PostprocessJointAngle

Boolean to enable (true) or disable (false) the postprocessing steps for joint angles corresponding to the joints predicted with low confidence. To be used only when FullBodyOnly is set to 0.

We recommend that you set this to true when input is upper-body image or video.

The default is true.

Set by the user.

TargetSeatedPoseForInterpolation

NvAR_Quaternion array that contains the target seated pose (for each of the 34 keypoints) to be used for post processing joint rotations.

For the joints that are predicted with low confidence, the output pose will be interpolated to the corresponding pose specified in this target pose.

This array is used when the SDK detects that the person in the input frame is in a seated pose.

Used only when FullBodyOnly is set to 0.

TargetStandPoseForInterpolation

NvAR_Quaternion array that contains the target standing pose (for each of the 34 keypoints) to be used for post processing joint rotations.

For the joints that are predicted with low confidence, the output pose will be interpolated to the corresponding pose specified in this target pose.

This array is used when the SDK detects that the person in the input frame is in a standing pose.

Used only when FullBodyOnly is set to 0.

TrackPeople

Unsigned integer to enable (1) or disable (0) multi-person tracking in Body Pose.

Set by the user.

ShadowTrackingAge

Unsigned integer that specifies the age (in number of frames) after which the multi-person tracker stops tracking the object in shadow mode.

The default is 90.

Set by the user.

ProbationAge

Unsigned integer that specifies the age (in number of frames) after which the multi-person tracker marks the object valid and assigns an ID for tracking.

The default is 10.

Set by the user.

MaxTargetsTracked

Unsigned integer that specifies the maximum number of targets to be tracked by the multi-person tracker. After the tracking is complete, the new targets are discarded.

The default is 30.

Set by the user.

Table 3‑18: Input Properties for 3D Body Pose Keypoint Tracking

Property Name

Value

Image

Interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

To be allocated and set by the user.

FocalLength

Float value that specifies the focal length of the camera to be used for 3D Body Pose.

The default value is 800.79041.

To be allocated and set by the user.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the number of bounding boxes that are equal to BatchSize on which to run 3D Body Pose detection.

If not specified as an input property, body detection is automatically run on the input image.

To be allocated by the user.

Table 3‑19: Output Properties for 3D Body Pose Keypoint Tracking

Property Name

Value

Keypoints

NvAR_Point2f array, which must be large enough to hold the points given by the product of NvAR_Parameter_Config(BatchSize) and 34.

To be allocated by the user.

Keypoints3D

NvAR_Point3f array, which must be large enough to hold the points given by the product of NvAR_Parameter_Config(BatchSize) and 34.

To be allocated by the user.

JointAngles

NvAR_Quaternion array, which must be large enough to hold the joints given by the product of NvAR_Parameter_Config(BatchSize) and 34.

They represent the local rotation (in Quaternion) of each joint with reference to the ReferencePose.

To be allocated by the user.

KeyPointsConfidence

An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values given by the product of NvAR_Parameter_Config(BatchSize) and 34.

To be allocated by the user.

BoundingBoxes

NvAR_BBoxes structure that contains the detected body through body detection performed by the 3D Body Pose feature.

To be allocated by the user.

TrackingBoundingBoxes

NvAR_TrackingBBoxes structure that contains the detected body through body detection that is completed by the 3D Body Pose feature and the tracking ID assigned by multi-person tracking.

To be allocated by the user.

Facial Expression Estimation Property Values#

The following tables list the values for the configuration, input, and output properties for Facial Expression Estimation.

Table 3‑20: Configuration Properties for Facial Expression Estimation

Property Name

Value

FeatureDescription

String that describes the feature.

This property is read-only.

ModelDir

String that contains the path to the face model and the TensorRT package files.

Set by the user.

CUDAStream

Optional: The CUDA stream.

Set by the user.

Temporal

Optional: Bitfield to control temporal filtering.

  • 0x001: Filter face detection.

  • 0x002: Filter facial landmarks.

  • 0x004: Filter rotational pose.

  • 0x010: Filter facial expressions.

  • 0x020: Filter gaze expressions.

  • 0x100: Enhance eye and mouth closure.

Default is 0x037 (all on except 0x100).

Set by the user.

Landmarks_Size

Unsigned integer, 68 or 126.

Required array size of detected facial landmark points. Length of array must be 126, to accommodate {x,y} location of each of the detected points.

ExpressionCount

Unsigned integer.

The number of expressions in the face model.

PoseMode

Specifies how to compute pose. 0 = 3DOF (default), 1 = 6DOF explicit.

6DOF is required for 3D translation output.

Mode

Flag to toggle landmark mode. Set 0 to enable Performance model for landmark detection. Set 1 to enable Quality model for landmark detection for higher accuracy. Default is 1.

EnableCheekPuff

(Experimental) Enables cheek puff blendshapes.

Table 3‑21: Input Properties for Facial Expression Estimation

Property Name

Value

Landmarks

Optional: An NvAR_Point2f array that contains the landmark points of size NvAR_Parameter_Config(Landmarks_Size) that is returned by the landmark detection feature.

If landmarks are not provided to this feature, an input image must be provided.

To be allocated by the user.

Image

Optional: An interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

If an input image is not provided as input, the landmark points must be provided to this feature as input.

To be allocated by the user.

CameraIntrinsicParams

Optional: Camera intrinsic parameters. A three-element float array with elements corresponding to focal length, cx, and cy, respectively, of an ideal perspective camera. Any barrel or fisheye distortion should be removed or considered negligible. Used only if PoseMode is set to 1.

Table 3‑22: Output Properties for Facial Expression Estimation

Property Name

Value

Landmarks

Optional: An NvAR_Point2f array, which must be large enough to hold the number of points of size NvAR_Parameter_Config(Landmarks_Size).

Pose

Optional: NvAR_Quaternion Pose rotation quaternion. Coordinate frame is NvAR Camera 3D pace.

To be allocated by the user.

PoseTranslation

Optional: NvAR_Point3f Pose 3D Translation. Computed only if PoseMode = 1. Translation coordinates are in NvAR Camera 3D Space coordinates, in which the units are centimeters.

To be allocated by the user.

LandmarksConfidence

Optional: An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values of size NvAR_Parameter_Config(LandmarksConfidence_Size).

To be allocated by the user.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the detected face that is determined internally.

To be allocated by the user.

BoundingBoxesConfidence

Optional: An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values of size NvAR_Parameter_Config(BoundingBoxesConfidence_Size).

To be allocated by the user.

ExpressionCoefficients

The array into which the expression coefficients will be placed, if desired.

Query ExpressionCount to determine the size for this array.

To be allocated by the user.

The corresponding expression shapes are in the following order:

BrowDown_L, BrowDown_R, BrowInnerUp_L, BrowInnerUp_R, BrowOuterUp_L, BrowOuterUp_R, CheekPuff_L, CheekPuff_R, CheekSquint_L, CheekSquint_R, EyeBlink_L, EyeBlink_R, EyeLookDown_L, EyeLookDown_R, EyeLookIn_L, EyeLookIn_R, EyeLookOut_L, EyeLookOut_R, EyeLookUp_L, EyeLookUp_R, EyeSquint_L, EyeSquint_R, EyeWide_L, EyeWide_R, JawForward, JawLeft, JawOpen, JawRight, MouthClose, MouthDimple_L, MouthDimple_R, MouthFrown_L, MouthFrown_R, MouthFunnel, MouthLeft, MouthLowerDown_L, MouthLowerDown_R, MouthPress_L, MouthPress_R, MouthPucker, MouthRight, MouthRollLower, MouthRollUpper, MouthShrugLower, MouthShrugUpper, MouthSmile_L, MouthSmile_R, MouthStretch_L, MouthStretch_R, MouthUpperUp_L, MouthUpperUp_R, NoseSneer_L, NoseSneer_R

Video Live Portrait Property Values#

The following tables list the values for the configuration, input, and output properties for Live Portrait.

Table 3‑23: Configuration Properties for Video Live Portrait

Property Name

Value

FeatureDescription

String that describes the feature.

This property is read-only.

ModelDir

String that contains the path to the face model and the TensorRT package files.

Set by the user.

CUDAStream

Optional: The CUDA stream.

Set by the user.

ModelSel

Model optimized for performance or for quality

  • 0: Performance model.

  • 1: Quality model (default).

Set by the user.

Mode

Video Live Portrait mode.

  • 1: Native face cropping mode (default).

  • 2: Registration blending mode.

  • 3: Inset blending mode.

Set by the user.

CheckFaceBox

Flag for checking face bounding box status.

  • 0: Disabled (default).

  • 1: FaceBoxStatus indicates the face box status.

Set by the user.

NetworkOutputImgWidth

Width of the output image generated from the network (512 or 1024).

NetworkOutputImgHeight

Height of the output image generated from the network (512 or 1024).

Table 3‑24: Input Properties for Video Live Portrait

Property Name

Value

SourceImage

Chunky/packed 8-bit BGR or BGRA CUDA buffer.

Requirements:

  • The resolution should be between 540p and 4K. We recommend a resolution of 720p or greater.

  • The full face of the subject should be visible.

  • Neutral expression (no smiling or any other expression).

  • Mouth is closed.

  • Front-facing pose and gaze.

  • Good lighting conditions.

  • Clear face features; no occlusion.

DriveImage

Chunky/packed 8-bit BGR CUDA buffer.

NeutralDriveImage

Chunky/packed 8-bit BGR CUDA buffer.

Table 3‑25: Output Properties for Video Live Portrait

Property Name

Value

GeneratedImage

Chunky/packed 8-bit BGR or BGRA CPU/CUDA buffer.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the detected face that is determined internally.

To be allocated by the user.

FaceBoxStatus

Output for detecting current status of face position within face box.

0: Face is inside the tracked bounding box.

1: Face is close to the border of the tracked bounding box.

2: Face is outside the tracked bounding box.

Frame Selection Property Values#

The following tables list the values for the configuration, input, and output properties for Frame Selection.

Table 3‑26: Configuration Properties for Frame Selection

Property Name

Value

FeatureDescription

String that describes the feature.

This property is read-only.

ModelDir

String that contains the path to the face model and the TensorRT package files.

Set by the user.

CUDAStream

Optional: The CUDA stream.

Set by the user.

Temporal

Optional: Bitfield to control temporal filtering.

  • 0x001: Filter face detection.

  • 0x002: Filter facial landmarks.

  • 0x004: Filter rotational pose.

  • 0x010: Filter facial expressions.

  • 0x020: Filter gaze expressions.

  • 0x100: Enhance eye and mouth closure.

Default: 0x037 (all on except 0x100).

Set by the user.

Mode

Optional: Frame Selection mode.

  • 0: Select based on image neutrality (neutral head pose and neutral expression).

  • Other values: Not implemented.

Default: 0.

Set by the user.

ActiveDuration

Optional: Specifies how long (in frames), beginning with the first frame, frame selection can report frame status (good or bad) before reporting expired status (NVAR_FRAME_SELECTOR_ACTIVE_DURATION_EXPIRED).

If no good frame is detected in the first n frames specified by ActiveDuration, SDK does not report NVAR_FRAME_SELECTOR_ACTIVE_DURATION_EXPIRED status until at least one good frame is detected or EOF is reached.

Default: 0 (runs forever).

Set by the user.

GoodFrameMinInterval

Optional: If two good frames are too close, we won’t report the latter one unless at least the specified number of frames are between the two good frames.

Default: 0 (no gap frame needed between good frames).

Set by the user.

Strategy

Optional: Flag to control frame selection strategy.

  • 0: Static threshold.

  • 1: Improving threshold.

Default: 1.

Set by the user.

Table 3‑27: Input Properties for Frame Selection

Property Name

Value

Image

Chunky/packed 8-bit BGR CUDA buffer.

Table 3‑28: Output Properties for Frame Selection

Property Name

Value

FrameSelectorStatus

Bitfield to indicate the current input image status.

  • 0: Indicates that the current frame meets the internal threshold bar and can be considered a good frame.

  • Other values: Each bit indicates a failed reason. Can be XORed to indicate multiple reasons.

To learn more about the status code, refer to nvAR_defs.h.

Speech Live Portrait Property Values#

The following tables list the values for the configuration, input, and output properties for Speech Live Portrait.

Table 3‑29: Configuration Properties for Speech Live Portrait

Property Name

Value

FeatureDescription

String that describes the feature.

This property is read-only.

ModelDir

String that contains the path to the face model and the TensorRT package files.

Set by the user.

CUDAStream

Optional: The CUDA stream.

Set by the user.

ModelSel

Model optimized for performance or for quality.

  • 0: Performance model.

  • 1: Quality model (default).

Set by the user.

Mode

Speech Live Portrait mode.

  • 1: Native face cropping mode (default).

  • 2: Registration blending mode.

  • 3: Inset blending mode.

If you see the head popping in and out while using Mode=2 (registration blending), we recommend specifying Mode=3 (inset blending) instead.

Set by the user.

HeadPoseMode

Select head animation.

  • 1: No head animation.

  • 2: Predefined head animation (default).

  • 3: User-provided head animation.

Set by the user.

SampleRate

The sample rate for the audio input. The SDK currently supports 16-kHz audio only.

NumChannels

The number of channels for the audio input. The SDK currently supports mono channel audio only.

SamplesPerFrame

The number of samples per audio frame. This is the number of audio samples passed to AudioFrameBuffer input.

NumInitialFrames

The number of initial audio frames before the first image can be generated. You need to provide NumInitialFrames number of audio frames before the first output is available at GeneratedImage output.

EnableLookAway

Flag to enable gaze lookaway.

  • 0: Disable (default).

  • 1: Enable.

Set by user.

LookAwayOffsetMax

The maximum integer value of gaze offset when lookaway is enabled.

Default: 20
Unit: Degrees

Set by user.

LookAwayIntervalRange

Range for picking the number of frames at which random look away occurs.

Default: 90
Range: [1, 600]
Unit: Frames

Set by user.

LookAwayIntervalMin

Minimum limit for the number of frames at which random lookaway occurs.

Default: 240
Range: [1, 600]
Unit: Frames

Set by user.

BlinkFrequency

The frequency of eye blinks per minute.

Default: 15
Range: [0, 120]
Unit: Frames

0 = disable eye blink

Set by user.

BlinkDuration

The duration of an eye blink.

Default: 6
Range: [2, 150]
Unit: Frames

Set by user.

MouthExpressionMultiplier

Specifies the degree of exaggeration for mouth movements. Higher values result in more exaggerated mouth motions.

Default: 1.4f
Range: [1.0f, 1.6f]

Set by user

MouthExpressionBase

Defines the base openness of the mouth when idle (that is, zero audio input). Higher values lead to a more open mouth appearance during the idle state.

Default: 0.3f
Range: [0.0f, 1.0f]

Set by user

HeadPoseMultiplier

A multiplier to dampen range of Head Pose Animation.

This is applicable only for HeadPoseMode=2 (predefined head animation).

Default: 1.0f
Range: [0.0f, 1.0f]

Set by user

Table 3‑30: Input Properties for Speech Live Portrait

Property Name

Value

SourceImage

Chunky/packed 8-bit BGR or BGRA CUDA buffer.

Requirements:

  • The resolution should be between 540p and 4K. We recommend a resolution of 720p or greater.

  • The full face of the subject should be visible.

  • Neutral expression (no smiling or any other expression).

  • Mouth is closed.

  • Front-facing pose and gaze.

  • Good lighting conditions.

  • Clear face features; no occlusion.

If the face is close to the border of the image, in Mode=1 (native face cropping mode) you might see letter-boxing (black pixels). This is done to ensure the face is centered for the best lip-sync quality. To avoid this, ensure the portrait image’s face is not close to the border.

AudioFrameBuffer

Raw audio frame buffer in CPU ranging from −1.0 to 1.0, inclusive.

Audio requirements:

  • Sample rate: 16 kHz.

  • Channel: 1 (mono).

  • Only one speaker in the audio.

  • No background noise in the audio.

HeadPoseRotation

NvAR_Quaternion buffer that provides the head pose rotation to be applied. This is valid only for HeadPoseMode=3.

The input format is [qx, qy, qz, qw]. If the input quaternion value is out-of-range, the value is clamped to ±20 degrees in Euler angle.

HeadPoseTranslation

NvAR_Vector3f buffer that provides the head pose translation to be applied. This is valid only for HeadPoseMode=3.

The input format is [tx, ty, sz]. The range is [+-0.03, +-0.02, 0.97-1.03]. An out-of-range value will be clamped and logged as a warning.

Table 3‑31: Output Properties for Speech Live Portrait

Property Name

Value

GeneratedImage

Chunky/packed 8-bit BGR or BGRA CPU/CUDA buffer.

If the face is close to the border of the image, in Mode=1 (native face cropping mode) you might see letter-boxing (black pixels). This is done to ensure the face is centered for the best lip-sync quality. To avoid this, ensure the portrait image’s image’s face is not close to the border.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the detected face that is determined internally.

To be allocated by the user.

LipSync Property Values#

The following tables list the values for the configuration, input, and output properties for LipSync.

Table 3‑32: Configuration Properties for LipSync

Property Name

Value

FeatureDescription

String that describes the feature.

This property is read-only.

ModelDir

String that contains the path to the face model and the TensorRT package files.

Set by the user.

CUDAStream

Optional: The CUDA stream.

Set by the user.

SampleRate

The sample rate for the audio input. The SDK currently supports 16-kHz audio only.

NumChannels

The number of channels for the audio input. The SDK currently supports mono channel audio only.

NumInitialFrames

The number of initial audio frames before the first image can be generated. You need to provide NumInitialFrames number of video frames before the first output is available at Image output.

The default value is 14.

This property is read-only.

Table 3‑33: Input Properties for LipSync

Property Name

Value

Image

Chunky/packed 8-bit BGR or BGRA CUDA buffer.

Requirements:

  • Resolution between 360p and 4K. Face region in the input frame up to 512×512 resolution.

  • Only one person visible in the frame.

  • The full face of the subject is visible.

  • Maximum of ±30 degrees head movement in each axis.

  • Moderate to good lighting conditions.

  • Good lighting conditions.

  • Clear face features; no occlusion.

AudioFrameBuffer

Raw audio frame buffer in CPU ranging from –1.0 to 1.0, inclusive.

When the LipSync feature is run, it assumes that the contents of the audio frame buffer are synchronized with the current input video frame. The length of the audio frame should approximately match the duration of the video frame. The caller can vary the length of each audio frame to maintain synchronization.

Audio requirements:

  • Sample rate: 16 kHz.

  • Channel: 1 (mono).

  • Only one speaker in the audio.

  • No background noise in the audio.

SpeakerData

An NvAR_SpeakerData object that contains the audio data and the region of interest to animate a speaker’s face in a video frame.

This input is an alternative to using AudioFrameBuffer, with additional properties to control the animation. The audio data is subject to the same constraints as AudioFrameBuffer.

CameraIntrinsicParams

Optional: Camera intrinsic parameters. A three-element float array with elements corresponding to focal length, cx, and cy, respectively, of an ideal perspective camera. Any barrel or fisheye distortion should be removed or considered negligible. Default: {f=input_height,cx=input_width/2, cy=input_height/2}.

Table 3‑34: Output Properties for LipSync

Property Name

Value

Image

Chunky/packed 8-bit BGR or BGRA CPU/CUDA buffer.

Ready

Flag that is set to a non-zero value when the first output video frame is generated.

Activation

Floating-point value in the range 0–1 that indicates the activation level of LipSync in the output. When the activation is 0, it means the original face was copied directly to the output frame without modification. When the activation is 1, the original face was completely replaced by the animated face in the output frame.