Properties for the AR SDK Features#

This section provides the properties and their values for the features in the AR SDK.

Face Tracking Property Values#

The following tables list the values for the configuration, input, and output properties for face tracking.

Table 3‑2: Configuration Properties for Face Tracking

Property Name	Value
FeatureDescription	String is free-form text that describes the feature. The string is set by the SDK and cannot be modified by the user.
CUDAStream	The CUDA stream, which is set by the user.
ModelDir	String that contains the path to the folder that contains the TensorRT package files. Set by the user.
Temporal	Unsigned integer to enable (1) or disable (0) the temporal optimization of face detection. If enabled, only one face is returned. For more information, refer to Face Detection and Tracking. Set by the user.

Table 3‑3: Input Properties for Face Tracking

Property Name

Value

Image

Interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

To be allocated and set by the user.

Table 3‑4: Output Properties for Face Tracking

Property Name

Value

BoundingBoxes

NvAR_BBoxes structure that holds the detected face boxes.

To be allocated by the user.

BoundingBoxesConfidence

Optional: An array of single-precision (32-bit) floating-point numbers that contain the confidence values for each detected face box.

To be allocated by the user.

Landmark Tracking Property Values#

The following tables list the values for the configuration, input, and output properties for landmark tracking.

Table 3‑5: Configuration Properties for Landmark Tracking

Property Name	Value
FeatureDescription	String that describes the feature.
CUDAStream	The CUDA stream. Set by the user.
ModelDir	String that contains the path to the folder that contains the TensorRT package files. Set by the user.
BatchSize	The number of inferences to be run at one time on the GPU. The maximum value is 8. Temporal optimization of landmark detection is supported only for `BatchSize=1`.
Landmarks_Size	Unsigned integer, 68 or 126. Specifies the number of landmark points (x and y values) to be returned. Set by the user.
LandmarksConfidence_Size	Unsigned integer, 68 or 126. Specifies the number of landmark confidence values for the detected keypoints to be returned. Set by the user.
Temporal	Unsigned integer to enable (1) or disable (0) the temporal optimization of landmark detection. If enabled, only one input bounding box is supported as the input. For more information, refer to Face Detection and Tracking. Set by the user.
Mode	Optional: Unsigned integer. Set 0 to enable Performance mode (default) or 1 to enable Quality mode for landmark detection. Set by the user.

Table 3‑6: Input Properties for Landmark Tracking

Property Name

Value

Image

Interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

To be allocated and set by the user.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the number of bounding boxes that are equal to BatchSize on which to run landmark detection.

If not specified as an input property, face detection is automatically run on the input image. For more information, refer to Face Detection and Tracking.

To be allocated by the user.

Table 3‑7: Output Properties for Landmark Tracking

Property Name	Value
Landmarks	`NvAR_Point2f` array, which must be large enough to hold the number of points given by the product of `NvAR_Parameter_Config(BatchSize)` and `NvAR_Parameter_Config(Landmarks_Size)`. To be allocated by the user.
Pose	Optional: `NvAR_Quaternion` array, which must be large enough to hold the number of quaternions equal to `NvAR_Parameter_Config(BatchSize)`. The OpenGL standards coordinate convention is used: When you look up from a camera, the coordinates are x (camera right), y (camera up), and z (toward camera). To be allocated by the user.
LandmarksConfidence	Optional: An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values given by the product of the following: `NvAR_Parameter_Config(BatchSize)` `NvAR_Parameter_Config(LandmarksConfidence_Size)` To be allocated by the user.
BoundingBoxes	Optional: `NvAR_BBoxes` structure that contains the detected face through face detection performed by the landmark detection feature. For more information, refer to Face Detection and Tracking. To be allocated by the user.

Face 3D Mesh Tracking Property Values#

The following tables list the values for the configuration, input, and output properties for Face 3D Mesh tracking.

Table 3‑8: Configuration Properties for Face 3D Mesh Tracking

Property Name	Value
FeatureDescription	String that describes the feature. This property is read-only.
ModelDir	String that contains the path to the face model and the TensorRT package files. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. Set by the user.
CUDAStream	Optional: The CUDA stream. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. Set by the user.
Temporal	Optional: Unsigned integer to enable (1) or disable (0) the temporal optimization of face and landmark detection. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. Set by the user.
Mode	Optional: Unsigned integer. Set 0 to enable Performance mode (default) or 1 to enable Quality mode for landmark detection. Set by the user.
Landmarks_Size	Unsigned integer, 68 or 126. If landmark detection is run internally, the confidence values for the detected key points are returned. For more information, refer to Alternative Usage of the Face 3D Mesh Feature.
ShapeEigenValueCount	The number of eigenvalues that describe the identity shape. Query this to determine how big the eigenvalue array should be, if that is a desired output. This property is read-only.
ExpressionCount	The number of expressions available in the chosen model. Query this to determine how big the expression coefficient array should be, if that is the desired output. This property is read-only.
VertexCount	The number of vertices in the chosen model. Query this property to determine how big the vertex array should be, where `VertexCount` is the number of vertices, as is done with similar counts. This property is read-only.
TriangleCount	The number of triangles in the chosen model. Query this property to determine how big the triangle array should be, where `TriangleCount` is the number of triangles, as is done with similar counts. This property is read-only.
GazeMode	Flag to toggle gaze mode. The default value is 0. If the value is 1, gaze estimation is explicit.

Table 3‑9: Input Properties for Face 3D Mesh Tracking

Property Name	Value
Width	The width of the input image buffer that contains the face to which the face model will be fitted. Set by the user.
Height	The height of the input image buffer that contains the face to which the face model will be fitted. Set by the user.
Landmarks	Optional: An `NvAR_Point2f` array that contains the landmark points of size `NvAR_Parameter_Config(Landmarks_Size)` that is returned by the landmark detection feature. If landmarks are not provided to this feature, an input image must be provided. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. To be allocated by the user.
Image	Optional: An interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type `NvCVImage`. If an input image is not provided as input, the landmark points must be provided to this feature as input. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. To be allocated by the user.

Table 3‑10: Output Properties for Face 3D Mesh Tracking

Property Name	Value
FaceMesh	`NvAR_FaceMesh` structure that contains the output face mesh. To be allocated by the user. Query `VertexCount` and `TriangleCount` to determine how much memory to allocate.
RenderingParams	`NvAR_RenderingParams` structure that contains the rendering parameters for drawing the face mesh that is returned by this feature. To be allocated by the user.
Landmarks	Optional: An `NvAR_Point2f` array, which must be large enough to hold the number of points of size `NvAR_Parameter_Config(Landmarks_Size)`. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. To be allocated by the user.
Pose	Optional: `NvAR_Quaternion` array, which must be large enough to hold one quaternion. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. The OpenGL standards coordinate convention is used: When you look up from a camera, the coordinates are x (camera right), y (camera up), and z (toward camera). To be allocated by the user.
LandmarksConfidence	Optional: An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values of size `NvAR_Parameter_Config(LandmarksConfidence_Size)`. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. To be allocated by the user.
BoundingBoxes	Optional: `NvAR_BBoxes` structure that contains the detected face that is determined internally. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. To be allocated by the user.
BoundingBoxesConfidence	Optional: An array of single-precision (32-bit) floating-point numbers that contain the confidence values for each detected face box. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. To be allocated by the user.
ShapeEigenValues	Optional: The array into which the shape eigenvalues will be placed, if desired. Query `ShapeEigenValueCount` to determine how big this array should be. To be allocated by the user.
ExpressionCoefficients	Optional: The array into which the expression coefficients will be placed, if desired. Query `ExpressionCount` to determine how big this array should be. To be allocated by the user. The corresponding expression shapes for `face_model2.nvf` are in the following order: `BrowDown_L`, `BrowDown_R`, `BrowInnerUp_L`, `BrowInnerUp_R`, `BrowOuterUp_L`, `BrowOuterUp_R`, `CheekPuff_L`, `CheekPuff_R`, `CheekSquint_L`, `CheekSquint_R`, `EyeBlink_L`, `EyeBlink_R`, `EyeLookDown_L`, `EyeLookDown_R`, `EyeLookIn_L`, `EyeLookIn_R`, `EyeLookOut_L`, `EyeLookOut_R`, `EyeLookUp_L`, `EyeLookUp_R`, `EyeSquint_L`, `EyeSquint_R`, `EyeWide_L`, `EyeWide_R`, `JawForward`, `JawLeft`, `JawOpen`, `JawRight`, `MouthClose`, `MouthDimple_L`, `MouthDimple_R`, `MouthFrown_L`, `MouthFrown_R`, `MouthFunnel`, `MouthLeft`, `MouthLowerDown_L`, `MouthLowerDown_R`, `MouthPress_L`, `MouthPress_R`, `MouthPucker`, `MouthRight`, `MouthRollLower`, `MouthRollUpper`, `MouthShrugLower`, `MouthShrugUpper`, `MouthSmile_L`, `MouthSmile_R`, `MouthStretch_L`, `MouthStretch_R`, `MouthUpperUp_L`, `MouthUpperUp_R`, `NoseSneer_L`, `NoseSneer_R`,

Eye Contact Property Values#

The following tables list the values for the configuration, input, and output properties for gaze redirection.

Table 3‑11: Configuration Properties for Eye Contact

Property Name	Value
FeatureDescription	String that describes the feature.
ModelDir	String that contains the path to the folder that contains the TensorRT package files. Set by the user.
BatchSize	The number of inferences to be run at one time on the GPU. The maximum value is 1.
Landmarks_Size	Unsigned integer, either 68 or 126. Specifies the number of landmark points (x and y values) to be returned. Set by the user.
LandmarksConfidence_Size	Unsigned integer, either 68 or 126. Specifies the number of landmark confidence values for the detected keypoints to be returned. Set by the user.
GazeRedirect	Flag to enable or disable gaze redirection. When enabled, the gaze is estimated, and the redirected image is set as the output. When disabled, the gaze is estimated but redirection does not occur.
Temporal	Unsigned integer to enable (1) or disable (0) the temporal optimization of landmark detection. Set by the user.
DetectClosure	Flag to toggle the detection of eye closure and occlusion. The default value is On.
EyeSizeSensitivity	An unsigned integer in the range 2–5, inclusive, that is used to increase the sensitivity of the algorithm to the redirected eye size. A value of 2 uses a smaller eye region, and a value of 5 uses a larger eye size.
UseCudaGraph	Bool. Default is False. Flag to use CUDA Graphs for optimization. Set by the user.
EnableLookAway	Bool. Default is false. Flag that, when set to true, redirects the eyes to look away at a random time for a random period. The eyes follow the relative changes in estimated gaze during the lookaway period. Set by the user.
LookAwayOffsetMax	Unsigned int value in the range 0–10. Default is 5. If the value is set to x degrees, a randomly chosen offset angle in the range −x to x in degrees will be added to the lookaway angle during the random lookaway period. The lookaway angle is based on the relative motion of the eyes in the input image during the lookaway period. It is not used outside the lookaway period. Set by the user.
LookAwayIntervalMin	Unsigned int value in the range 1–600. Default is 100. Minimum limit for the number of frames at which random look away occurs. This value is applicable only when `EnableLookAway` is set to true. The value can be optionally set by the user.
LookAwayIntervalRange	Unsigned int value in the range 1–600. Default is 250. Interval range for picking the number of frames at which random lookaway occurs. Adding this range to `LookAwayIntervalMin` provides the maximum limit for the number of frames at which random lookaway occurs. This value is applicable only when `EnableLookAway` is set to true. The value can be optionally set by the user.
GazePitchThresholdLow	Float value in the range of 10.0–35.0 (degrees). Default is 20.0. This is a range control parameter. It defines the threshold for estimated gaze angle in the pitch direction within which gaze is always redirected towards the camera. Beyond this angle, the redirected gaze transitions away from the camera and towards the estimated gaze angle. This value is optionally set by the user.
GazeYawThresholdLow	Float value in the range of 10.0–35.0 (degrees). Default is 20.0. This is a range control parameter. It defines the threshold for estimated gaze angle in the yaw direction within which gaze is always redirected towards the camera. Beyond this angle, the redirected gaze transitions away from the camera and towards the estimated gaze angle. This value is optionally set by the user.
HeadPitchThresholdLow	Float value in the range of 10.0–35.0 (degrees). Default is 15.0. This is a range control parameter. It defines the threshold for estimated head pose angle in the pitch direction within which gaze is always redirected towards the camera. Beyond this angle, the redirected gaze transitions away from the camera and towards the estimated gaze angle. This value is optionally set by the user.
HeadYawThresholdLow	Float value in the range of 10.0–35.0 (degrees). Default is 25.0. This is a range control parameter. It defines the threshold for estimated head pose angle in the yaw direction within which gaze is always redirected towards the camera. Beyond this angle, the redirected gaze transitions away from the camera and towards the estimated gaze angle. This value is optionally set by the user.
GazePitchThresholdHigh	Float value in the range of 10.0–35.0 (degrees). Default is 30.0. This is a range control parameter. It defines the threshold for estimated gaze angle in the pitch direction beyond which no redirection occurs and the angle of redirected gaze is equal to the estimated gaze. The redirected gaze in the pitch direction increasingly moves away from the camera and towards the estimated gaze beyond `GazePitchThresholdLow` and reaches the estimated gaze value at `GazePitchThresholdHigh`. The value of this parameter is expected to be greater than `GazePitchThresholdLow`. This value is optionally set by the user.
GazeYawThresholdHigh	Float value in the range of 10.0–35.0 (degrees). Default is 30.0. This is a range control parameter. It defines the threshold for estimated gaze angle in the yaw direction beyond which no redirection occurs and the angle of redirected gaze is equal to the estimated gaze. The redirected gaze in the yaw direction increasingly moves away from the camera and towards the estimated gaze beyond `GazeYawThresholdLow` and reaches the estimated gaze value at `GazeYawThresholdHigh`. The value of this parameter is expected to be greater than `GazeYawThresholdLow`. This value is optionally set by the user.
HeadPitchThresholdHigh	Float value in the range of 10.0-35.0 (degrees). Default value 25.0. This is a range control parameter. It defines the threshold for estimated head pose angle in the pitch direction beyond which no redirection occurs and the angle of redirected gaze is equal to the estimated gaze. The redirected gaze in the pitch direction increasingly moves away from the camera and towards the estimated gaze beyond `HeadPitchThresholdLow` and reaches the estimated gaze value at `HeadPitchThresholdHigh`. The value of this parameter is expected to be greater than `HeadPitchThresholdLow`. This value is optionally set by the user.
HeadYawThresholdHigh	Float value in the range of 10.0-35.0 (degrees). Default value 30.0. This is a range control parameter. It defines the threshold for estimated head pose angle in the yaw direction beyond which no redirection occurs and the angle of redirected gaze is equal to the estimated gaze. The redirected gaze in the yaw direction increasingly moves away from the camera and towards the estimated gaze beyond `HeadYawThresholdLow` and reaches the estimated gaze value at `HeadYawThresholdHigh`. The value of this parameter is expected to be greater than `HeadYawThresholdLow`. This value is optionally set by the user.

Table 3‑12: Input Properties for Eye Contact

Property Name	Value
Image	Interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type `NvCVImage`. To be allocated and set by the user.
Width	The width of the input image buffer that contains the face to which the face model will be fitted. Set by the user.
Height	The height of the input image buffer that contains the face to which the face model will be fitted. Set by the user.
Landmarks	Optional: An `NvAR_Point2f` array that contains the landmark points of size `NvAR_Parameter_Config(Landmarks_Size)` that is returned by the landmark detection feature. If landmarks are not provided to this feature, an input image must be provided. For more information, refer to Alternative Usage of the Face 3D Mesh Feature. To be allocated by the user.

Table 3‑13: Output Properties for Eye Contact

Property Name	Value
Landmarks	`NvAR_Point2f` array, which must be large enough to hold the number of points given by the product of the following: `NvAR_Parameter_Config(BatchSize)` `NvAR_Parameter_Config(Landmarks_Size)` To be allocated by the user.
HeadPose	Optional: `NvAR_Quaternion` array, which must be large enough to hold the number of quaternions equal to `NvAR_Parameter_Config(BatchSize)`. The OpenGL standards coordinate convention is used: When you look up from a camera, the coordinates are x (camera right), y (camera up), and z (toward the camera). To be allocated by the user.
LandmarksConfidence	Optional: An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values given by the product of the following: `NvAR_Parameter_Config(BatchSize)` `NvAR_Parameter_Config(LandmarksConfidence_Size)` To be allocated by the user.
BoundingBoxes	Optional: `NvAR_BBoxes` structure that contains the detected face through face detection performed by the landmark detection feature. Refer to Alternative Usage of the Face 3D Mesh Feature for more information. To be allocated by the user.
OutputGazeVector	Float array, which must be large enough to hold the two values (pitch and yaw) for the gaze angle in radians per image. For batch sizes larger than 1, it should hold `NvAR_Parameter_Config(BatchSize)` × 2 float values. To be allocated by the user.
OutputHeadTranslation	Optional: Float array, which must be large enough to hold the head translations (x,y,z) per image. For batch sizes larger than 1, it should hold `NvAR_Parameter_Config(BatchSize)` × 3 float values. To be allocated by the user.
GazeDirection	Optional: `NvAR_Point3f` array that is large enough to hold as many elements as `NvAR_Parameter_Config(BatchSize)`. Each element contains two `NvAR_Point3f` points. One point represents the center point (cx,cy,cz) between the eyes, and the other point represents a unit vector (ux, uy, uz) in the gaze direction for visualization. For batch sizes larger than 1, it should hold `NvAR_Parameter_Config(BatchSize)` × 2 `NvAR_Point3f` points. To be allocated by the user.

Body Detection Property Values#

The following tables list the values for the configuration, input, and output properties for Body Detection racking.

Table 3‑14: Configuration Properties for Body Detection

Property Name	Value
FeatureDescription	String is free-form text that describes the feature. The string is set by the SDK and cannot be modified by the user.
CUDAStream	The CUDA stream, which is set by the user.
ModelDir	String that contains the path to the folder that contains the TensorRT package files. Set by the user.
Temporal	Unsigned integer to enable (1) or disable (0) the temporal optimization of body detection. Set by the user.
FullBodyOnly	Unsigned integer to select the estimation mode: 1: Full Body only (default). 0: Full and upper body. Set by the user.

Table 3‑15: Input Properties for Body Detection

Property Name

Value

Image

Interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

To be allocated and set by the user.

Table 3‑16: Output Properties for Body Detection

Property Name

Value

BoundingBoxes

NvAR_BBoxes structure that holds the detected body boxes.

To be allocated by the user.

BoundingBoxesConfidence

Optional: An array of single-precision (32-bit) floating-point numbers that contain the confidence values for each detected body box.

To be allocated by the user.

3D Body Pose Keypoint Tracking Property Values#

The following tables list the values for the configuration, input, and output properties for 3D Body Pose Keypoint Tracking racking.

Table 3‑17: Configuration Properties for 3D Body Pose Keypoint Tracking

Property Name	Value
FeatureDescription	String that describes the feature.
CUDAStream	The CUDA stream. Set by the user.
ModelDir	String that contains the path to the folder that contains the TensorRT package files. Set by the user.
BatchSize	The number of inferences to be run at one time on the GPU. The maximum value is 1.
Mode	Unsigned integer that specifies the mode: High Performance (1) or High Quality (0). Default is 1. Set by the user.
UseCudaGraph	Boolean to enable (true) or disable (false) the use of CUDA Graphs for optimization. Set by the user.
Temporal	Unsigned integer to enable (1) or disable (0) the temporal optimization of Body Pose tracking. Set by the user.
NumKeyPoints	Unsigned integer that specifies the number of keypoints available, which is currently 34.
ReferencePose	`NvAR_Point3f` array that contains the reference pose for each of the keypoints. Set by the user.
FullBodyOnly	Unsigned integer to select the pose estimation mode: Full Body only (1). Supports both high quality and high performance modes. Full and upper body (0). Supports only high quality mode. The default is 1. Set by the user.
PostprocessJointAngle	Boolean to enable (true) or disable (false) the postprocessing steps for joint angles corresponding to the joints predicted with low confidence. To be used only when `FullBodyOnly` is set to 0. We recommend that you set this to true when input is upper-body image or video. The default is true. Set by the user.
TargetSeatedPoseForInterpolation	`NvAR_Quaternion` array that contains the target seated pose (for each of the 34 keypoints) to be used for post processing joint rotations. For the joints that are predicted with low confidence, the output pose will be interpolated to the corresponding pose specified in this target pose. This array is used when the SDK detects that the person in the input frame is in a seated pose. Used only when `FullBodyOnly` is set to 0.
TargetStandPoseForInterpolation	`NvAR_Quaternion` array that contains the target standing pose (for each of the 34 keypoints) to be used for post processing joint rotations. For the joints that are predicted with low confidence, the output pose will be interpolated to the corresponding pose specified in this target pose. This array is used when the SDK detects that the person in the input frame is in a standing pose. Used only when `FullBodyOnly` is set to 0.
TrackPeople	Unsigned integer to enable (1) or disable (0) multi-person tracking in Body Pose. Set by the user.
ShadowTrackingAge	Unsigned integer that specifies the age (in number of frames) after which the multi-person tracker stops tracking the object in shadow mode. The default is 90. Set by the user.
ProbationAge	Unsigned integer that specifies the age (in number of frames) after which the multi-person tracker marks the object valid and assigns an ID for tracking. The default is 10. Set by the user.
MaxTargetsTracked	Unsigned integer that specifies the maximum number of targets to be tracked by the multi-person tracker. After the tracking is complete, the new targets are discarded. The default is 30. Set by the user.

Table 3‑18: Input Properties for 3D Body Pose Keypoint Tracking

Property Name

Value

Image

Interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

To be allocated and set by the user.

FocalLength

Float value that specifies the focal length of the camera to be used for 3D Body Pose.

The default value is 800.79041.

To be allocated and set by the user.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the number of bounding boxes that are equal to BatchSize on which to run 3D Body Pose detection.

If not specified as an input property, body detection is automatically run on the input image.

To be allocated by the user.

Table 3‑19: Output Properties for 3D Body Pose Keypoint Tracking

Property Name	Value
Keypoints	`NvAR_Point2f` array, which must be large enough to hold the points given by the product of `NvAR_Parameter_Config(BatchSize)` and 34. To be allocated by the user.
Keypoints3D	`NvAR_Point3f` array, which must be large enough to hold the points given by the product of `NvAR_Parameter_Config(BatchSize)` and 34. To be allocated by the user.
JointAngles	`NvAR_Quaternion` array, which must be large enough to hold the joints given by the product of `NvAR_Parameter_Config(BatchSize)` and 34. They represent the local rotation (in Quaternion) of each joint with reference to the ReferencePose. To be allocated by the user.
KeyPointsConfidence	An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values given by the product of `NvAR_Parameter_Config(BatchSize)` and 34. To be allocated by the user.
BoundingBoxes	`NvAR_BBoxes` structure that contains the detected body through body detection performed by the 3D Body Pose feature. To be allocated by the user.
TrackingBoundingBoxes	`NvAR_TrackingBBoxes` structure that contains the detected body through body detection that is completed by the 3D Body Pose feature and the tracking ID assigned by multi-person tracking. To be allocated by the user.

Facial Expression Estimation Property Values#

The following tables list the values for the configuration, input, and output properties for Facial Expression Estimation.

Table 3‑20: Configuration Properties for Facial Expression Estimation

Property Name	Value
FeatureDescription	String that describes the feature. This property is read-only.
ModelDir	String that contains the path to the face model and the TensorRT package files. Set by the user.
CUDAStream	Optional: The CUDA stream. Set by the user.
Temporal	Optional: Bitfield to control temporal filtering. 0x001: Filter face detection. 0x002: Filter facial landmarks. 0x004: Filter rotational pose. 0x010: Filter facial expressions. 0x020: Filter gaze expressions. 0x100: Enhance eye and mouth closure. Default is 0x037 (all on except 0x100). Set by the user.
Landmarks_Size	Unsigned integer, 68 or 126. Required array size of detected facial landmark points. Length of array must be 126, to accommodate {x,y} location of each of the detected points.
ExpressionCount	Unsigned integer. The number of expressions in the face model.
PoseMode	Specifies how to compute pose. 0 = 3DOF (default), 1 = 6DOF explicit. 6DOF is required for 3D translation output.
Mode	Flag to toggle landmark mode. Set 0 to enable Performance model for landmark detection. Set 1 to enable Quality model for landmark detection for higher accuracy. Default is 1.
EnableCheekPuff	(Experimental) Enables cheek puff blendshapes.

Table 3‑21: Input Properties for Facial Expression Estimation

Property Name

Value

Landmarks

Optional: An NvAR_Point2f array that contains the landmark points of size NvAR_Parameter_Config(Landmarks_Size) that is returned by the landmark detection feature.

If landmarks are not provided to this feature, an input image must be provided.

To be allocated by the user.

Image

Optional: An interleaved (or chunky) 8-bit BGR input image in a CUDA buffer of type NvCVImage.

If an input image is not provided as input, the landmark points must be provided to this feature as input.

To be allocated by the user.

CameraIntrinsicParams

Optional: Camera intrinsic parameters. A three-element float array with elements corresponding to focal length, cx, and cy, respectively, of an ideal perspective camera. Any barrel or fisheye distortion should be removed or considered negligible. Used only if PoseMode is set to 1.

Table 3‑22: Output Properties for Facial Expression Estimation

Property Name	Value
Landmarks	Optional: An `NvAR_Point2f` array, which must be large enough to hold the number of points of size `NvAR_Parameter_Config(Landmarks_Size)`.
Pose	Optional: `NvAR_Quaternion` Pose rotation quaternion. Coordinate frame is NvAR Camera 3D pace. To be allocated by the user.
PoseTranslation	Optional: `NvAR_Point3f` Pose 3D Translation. Computed only if `PoseMode` = 1. Translation coordinates are in NvAR Camera 3D Space coordinates, in which the units are centimeters. To be allocated by the user.
LandmarksConfidence	Optional: An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values of size `NvAR_Parameter_Config(LandmarksConfidence_Size)`. To be allocated by the user.
BoundingBoxes	Optional: `NvAR_BBoxes` structure that contains the detected face that is determined internally. To be allocated by the user.
BoundingBoxesConfidence	Optional: An array of single-precision (32-bit) floating-point numbers, which must be large enough to hold the number of confidence values of size `NvAR_Parameter_Config(BoundingBoxesConfidence_Size)`. To be allocated by the user.
ExpressionCoefficients	The array into which the expression coefficients will be placed, if desired. Query `ExpressionCount` to determine the size for this array. To be allocated by the user. The corresponding expression shapes are in the following order: `BrowDown_L`, `BrowDown_R`, `BrowInnerUp_L`, `BrowInnerUp_R`, `BrowOuterUp_L`, `BrowOuterUp_R`, `CheekPuff_L`, `CheekPuff_R`, `CheekSquint_L`, `CheekSquint_R`, `EyeBlink_L`, `EyeBlink_R`, `EyeLookDown_L`, `EyeLookDown_R`, `EyeLookIn_L`, `EyeLookIn_R`, `EyeLookOut_L`, `EyeLookOut_R`, `EyeLookUp_L`, `EyeLookUp_R`, `EyeSquint_L`, `EyeSquint_R`, `EyeWide_L`, `EyeWide_R`, `JawForward`, `JawLeft`, `JawOpen`, `JawRight`, `MouthClose`, `MouthDimple_L`, `MouthDimple_R`, `MouthFrown_L`, `MouthFrown_R`, `MouthFunnel`, `MouthLeft`, `MouthLowerDown_L`, `MouthLowerDown_R`, `MouthPress_L`, `MouthPress_R`, `MouthPucker`, `MouthRight`, `MouthRollLower`, `MouthRollUpper`, `MouthShrugLower`, `MouthShrugUpper`, `MouthSmile_L`, `MouthSmile_R`, `MouthStretch_L`, `MouthStretch_R`, `MouthUpperUp_L`, `MouthUpperUp_R`, `NoseSneer_L`, `NoseSneer_R`

Video Live Portrait Property Values#

The following tables list the values for the configuration, input, and output properties for Live Portrait.

Table 3‑23: Configuration Properties for Video Live Portrait

Property Name	Value
FeatureDescription	String that describes the feature. This property is read-only.
ModelDir	String that contains the path to the face model and the TensorRT package files. Set by the user.
CUDAStream	Optional: The CUDA stream. Set by the user.
ModelSel	Model optimized for performance or for quality 0: Performance model. 1: Quality model (default). Set by the user.
Mode	Video Live Portrait mode. 1: Native face cropping mode (default). 2: Registration blending mode. 3: Inset blending mode. Set by the user.
CheckFaceBox	Flag for checking face bounding box status. 0: Disabled (default). 1: `FaceBoxStatus` indicates the face box status. Set by the user.
NetworkOutputImgWidth	Width of the output image generated from the network (512 or 1024).
NetworkOutputImgHeight	Height of the output image generated from the network (512 or 1024).

Table 3‑24: Input Properties for Video Live Portrait

Property Name

Value

SourceImage

Chunky/packed 8-bit BGR or BGRA CUDA buffer.

Requirements:

The resolution should be between 540p and 4K. We recommend a resolution of 720p or greater.
The full face of the subject should be visible.
Neutral expression (no smiling or any other expression).
Mouth is closed.
Front-facing pose and gaze.
Good lighting conditions.
Clear face features; no occlusion.

DriveImage

Chunky/packed 8-bit BGR CUDA buffer.

NeutralDriveImage

Chunky/packed 8-bit BGR CUDA buffer.

Table 3‑25: Output Properties for Video Live Portrait

Property Name

Value

GeneratedImage

Chunky/packed 8-bit BGR or BGRA CPU/CUDA buffer.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the detected face that is determined internally.

To be allocated by the user.

FaceBoxStatus

Output for detecting current status of face position within face box.

0: Face is inside the tracked bounding box.

1: Face is close to the border of the tracked bounding box.

2: Face is outside the tracked bounding box.

Frame Selection Property Values#

The following tables list the values for the configuration, input, and output properties for Frame Selection.

Table 3‑26: Configuration Properties for Frame Selection

Property Name	Value
FeatureDescription	String that describes the feature. This property is read-only.
ModelDir	String that contains the path to the face model and the TensorRT package files. Set by the user.
CUDAStream	Optional: The CUDA stream. Set by the user.
Temporal	Optional: Bitfield to control temporal filtering. 0x001: Filter face detection. 0x002: Filter facial landmarks. 0x004: Filter rotational pose. 0x010: Filter facial expressions. 0x020: Filter gaze expressions. 0x100: Enhance eye and mouth closure. Default: 0x037 (all on except 0x100). Set by the user.
Mode	Optional: Frame Selection mode. 0: Select based on image neutrality (neutral head pose and neutral expression). Other values: Not implemented. Default: 0. Set by the user.
ActiveDuration	Optional: Specifies how long (in frames), beginning with the first frame, frame selection can report frame status (good or bad) before reporting expired status (`NVAR_FRAME_SELECTOR_ACTIVE_DURATION_EXPIRED`). If no good frame is detected in the first n frames specified by `ActiveDuration`, SDK does not report `NVAR_FRAME_SELECTOR_ACTIVE_DURATION_EXPIRED` status until at least one good frame is detected or EOF is reached. Default: 0 (runs forever). Set by the user.
GoodFrameMinInterval	Optional: If two good frames are too close, we won’t report the latter one unless at least the specified number of frames are between the two good frames. Default: 0 (no gap frame needed between good frames). Set by the user.
Strategy	Optional: Flag to control frame selection strategy. 0: Static threshold. 1: Improving threshold. Default: 1. Set by the user.

Table 3‑27: Input Properties for Frame Selection

Property Name	Value
Image	Chunky/packed 8-bit BGR CUDA buffer.

Table 3‑28: Output Properties for Frame Selection

Property Name

Value

FrameSelectorStatus

Bitfield to indicate the current input image status.

0: Indicates that the current frame meets the internal threshold bar and can be considered a good frame.
Other values: Each bit indicates a failed reason. Can be XORed to indicate multiple reasons.

To learn more about the status code, refer to nvAR_defs.h.

Speech Live Portrait Property Values#

The following tables list the values for the configuration, input, and output properties for Speech Live Portrait.

Table 3‑29: Configuration Properties for Speech Live Portrait

Property Name	Value
FeatureDescription	String that describes the feature. This property is read-only.
ModelDir	String that contains the path to the face model and the TensorRT package files. Set by the user.
CUDAStream	Optional: The CUDA stream. Set by the user.
ModelSel	Model optimized for performance or for quality. 0: Performance model. 1: Quality model (default). Set by the user.
Mode	Speech Live Portrait mode. 1: Native face cropping mode (default). 2: Registration blending mode. 3: Inset blending mode. If you see the head popping in and out while using `Mode=2` (registration blending), we recommend specifying `Mode=3` (inset blending) instead. Set by the user.
HeadPoseMode	Select head animation. 1: No head animation. 2: Predefined head animation (default). 3: User-provided head animation. Set by the user.
SampleRate	The sample rate for the audio input. The SDK currently supports 16-kHz audio only.
NumChannels	The number of channels for the audio input. The SDK currently supports mono channel audio only.
SamplesPerFrame	The number of samples per audio frame. This is the number of audio samples passed to `AudioFrameBuffer` input.
NumInitialFrames	The number of initial audio frames before the first image can be generated. You need to provide `NumInitialFrames` number of audio frames before the first output is available at `GeneratedImage` output.
EnableLookAway	Flag to enable gaze lookaway. 0: Disable (default). 1: Enable. Set by user.
LookAwayOffsetMax	The maximum integer value of gaze offset when lookaway is enabled. Default: 20 Unit: Degrees Set by user.
LookAwayIntervalRange	Range for picking the number of frames at which random look away occurs. Default: 90 Range: [1, 600] Unit: Frames Set by user.
LookAwayIntervalMin	Minimum limit for the number of frames at which random lookaway occurs. Default: 240 Range: [1, 600] Unit: Frames Set by user.
BlinkFrequency	The frequency of eye blinks per minute. Default: 15 Range: [0, 120] Unit: Frames 0 = disable eye blink Set by user.
BlinkDuration	The duration of an eye blink. Default: 6 Range: [2, 150] Unit: Frames Set by user.
MouthExpressionMultiplier	Specifies the degree of exaggeration for mouth movements. Higher values result in more exaggerated mouth motions. Default: 1.4f Range: [1.0f, 1.6f] Set by user
MouthExpressionBase	Defines the base openness of the mouth when idle (that is, zero audio input). Higher values lead to a more open mouth appearance during the idle state. Default: 0.3f Range: [0.0f, 1.0f] Set by user
HeadPoseMultiplier	A multiplier to dampen range of Head Pose Animation. This is applicable only for `HeadPoseMode=2` (predefined head animation). Default: 1.0f Range: [0.0f, 1.0f] Set by user

Table 3‑30: Input Properties for Speech Live Portrait

Property Name	Value
SourceImage	Chunky/packed 8-bit BGR or BGRA CUDA buffer. Requirements: The resolution should be between 540p and 4K. We recommend a resolution of 720p or greater. The full face of the subject should be visible. Neutral expression (no smiling or any other expression). Mouth is closed. Front-facing pose and gaze. Good lighting conditions. Clear face features; no occlusion. If the face is close to the border of the image, in `Mode=1` (native face cropping mode) you might see letter-boxing (black pixels). This is done to ensure the face is centered for the best lip-sync quality. To avoid this, ensure the portrait image’s face is not close to the border.
AudioFrameBuffer	Raw audio frame buffer in CPU ranging from −1.0 to 1.0, inclusive. Audio requirements: Sample rate: 16 kHz. Channel: 1 (mono). Only one speaker in the audio. No background noise in the audio.
HeadPoseRotation	`NvAR_Quaternion` buffer that provides the head pose rotation to be applied. This is valid only for `HeadPoseMode=3`. The input format is [qx, qy, qz, qw]. If the input quaternion value is out-of-range, the value is clamped to ±20 degrees in Euler angle.
HeadPoseTranslation	`NvAR_Vector3f` buffer that provides the head pose translation to be applied. This is valid only for `HeadPoseMode=3`. The input format is [tx, ty, sz]. The range is [+-0.03, +-0.02, 0.97-1.03]. An out-of-range value will be clamped and logged as a warning.

Table 3‑31: Output Properties for Speech Live Portrait

Property Name

Value

GeneratedImage

Chunky/packed 8-bit BGR or BGRA CPU/CUDA buffer.

If the face is close to the border of the image, in Mode=1 (native face cropping mode) you might see letter-boxing (black pixels). This is done to ensure the face is centered for the best lip-sync quality. To avoid this, ensure the portrait image’s image’s face is not close to the border.

BoundingBoxes

Optional: NvAR_BBoxes structure that contains the detected face that is determined internally.

To be allocated by the user.

LipSync Property Values#

The following tables list the values for the configuration, input, and output properties for LipSync.

Table 3‑32: Configuration Properties for LipSync

Property Name	Value
FeatureDescription	String that describes the feature. This property is read-only.
ModelDir	String that contains the path to the face model and the TensorRT package files. Set by the user.
CUDAStream	Optional: The CUDA stream. Set by the user.
SampleRate	The sample rate for the audio input. The SDK currently supports 16-kHz audio only.
NumChannels	The number of channels for the audio input. The SDK currently supports mono channel audio only.
NumInitialFrames	The number of initial audio frames before the first image can be generated. You need to provide `NumInitialFrames` number of video frames before the first output is available at `Image` output. The default value is 14. This property is read-only.

Table 3‑33: Input Properties for LipSync

Property Name	Value
Image	Chunky/packed 8-bit BGR or BGRA CUDA buffer. Requirements: Resolution between 360p and 4K. Face region in the input frame up to 512×512 resolution. Only one person visible in the frame. The full face of the subject is visible. Maximum of ±30 degrees head movement in each axis. Moderate to good lighting conditions. Good lighting conditions. Clear face features; no occlusion.
AudioFrameBuffer	Raw audio frame buffer in CPU ranging from –1.0 to 1.0, inclusive. When the LipSync feature is run, it assumes that the contents of the audio frame buffer are synchronized with the current input video frame. The length of the audio frame should approximately match the duration of the video frame. The caller can vary the length of each audio frame to maintain synchronization. Audio requirements: Sample rate: 16 kHz. Channel: 1 (mono). Only one speaker in the audio. No background noise in the audio.
SpeakerData	An `NvAR_SpeakerData` object that contains the audio data and the region of interest to animate a speaker’s face in a video frame. This input is an alternative to using `AudioFrameBuffer`, with additional properties to control the animation. The audio data is subject to the same constraints as `AudioFrameBuffer`.
CameraIntrinsicParams	Optional: Camera intrinsic parameters. A three-element float array with elements corresponding to focal length, cx, and cy, respectively, of an ideal perspective camera. Any barrel or fisheye distortion should be removed or considered negligible. Default: {f=input_height,cx=input_width/2, cy=input_height/2}.

Table 3‑34: Output Properties for LipSync

Property Name	Value
Image	Chunky/packed 8-bit BGR or BGRA CPU/CUDA buffer.
Ready	Flag that is set to a non-zero value when the first output video frame is generated.
Activation	Floating-point value in the range 0–1 that indicates the activation level of LipSync in the output. When the activation is 0, it means the original face was copied directly to the output frame without modification. When the activation is 1, the original face was completely replaced by the animated face in the output frame.