Structures#

This section provides information about the structures in the AR SDK APIs. The structures are defined in the following header files:

  • nvAR.h

  • nvAR_defs.h (mostly data types)

NvAR_BBoxes#

struct NvAR_BBoxes {
    NvAR_Rect *boxes;
    uint8_t num_boxes;
    uint8_t max_boxes;
};

Members#

boxes
Type: NvAR_Rect*
Pointer to an array of bounding boxes that are allocated by the user.
num_boxes
Type: uint8_t
The number of bounding boxes in the array.
max_boxes
Type: uint8_t
The maximum number of bounding boxes that can be stored in the array as defined by the user.

Remarks#

This structure is returned as the output of the face detection feature.

Defined in: nvAR_defs.h

NvAR_TrackingBBox#

struct NvAR_TrackingBBox {
    NvAR_Rect bbox;
    uint16_t tracking_id;
};

Members#

bbox
Type: NvAR_Rect
Bounding box that is allocated by the user.
tracking_id
Type: uint16_t
The Tracking ID assigned to the bounding box by Multi-Person Tracking.

Remarks#

This structure is returned as the output of the body pose feature when multi-person tracking is enabled.

Defined in: nvAR_defs.h

NvAR_TrackingBBoxes#

struct NvAR_TrackingBBoxes {
    NvAR_TrackingBBox *boxes;
    uint8_t num_boxes;
    uint8_t max_boxes;
};

Members#

boxes
Type: NvAR_TrackingBBox *
Pointer to an array of tracking bounding boxes that are allocated by the user.
num_boxes
Type: uint8_t
The number of bounding boxes in the array.
max_boxes
Type: uint8_t
The maximum number of bounding boxes that can be stored in the array as defined by the user.

Remarks#

This structure is returned as the output of the body pose feature when multi-person tracking is enabled.

Defined in: nvAR_defs.h

NvAR_FaceMesh#

struct NvAR_FaceMesh {
    NvAR_Vec3<float> *vertices;
    size_t num_vertices;
    NvAR_Vec3<unsigned short> *tvi;
    size_t num_triangles;
};

Members#

vertices
Type: NvAR_Vec3<float>*
Pointer to an array of vectors that represent the mesh 3D vertex positions.
num_vertices
Type: size_t
The number of vertices in the array pointed to by the vertices parameter.
tvi
Type: NvAR_Vec3<unsigned short> *
Pointer to an array of vectors that represent the mesh triangle’s vertex indices.
num_triangles
Type: size_t
The number of mesh triangles.

Remarks#

This structure is returned as an output of the Mesh Tracking feature.

Defined in: nvAR_defs.h

NvAR_Frustum#

struct NvAR_Frustum {
    float left = -1.0f;
    float right = 1.0f;
    float bottom = -1.0f;
    float top = 1.0f;
};

Members#

left
Type: float
The X coordinate of the top-left corner of the viewing frustum.
right
Type: float
The X coordinate of the bottom-right corner of the viewing frustum.
bottom
Type: float
The Y coordinate of the bottom-right corner of the viewing frustum.
top
Type: float
The Y coordinate of the top-left corner of the viewing frustum.

Remarks#

This structure represents a camera viewing frustum for an orthographic camera. As a result, it contains only the left, the right, the top, and the bottom coordinates in pixels. It does not contain a near or a far clipping plane.

Defined in: nvAR_defs.h

NvAR_FeatureHandle#

typedef struct nvAR_Feature *NvAR_FeatureHandle;

Remarks#

This type defines the handle of a feature that is defined by the SDK. It is used to reference the feature at runtime when the feature is executed and must be destroyed when it is no longer required.

Defined in: nvAR_defs.h

NvAR_Point2f#

typedef struct NvAR_Point2f {
    float x, y;
} NvAR_Point2f;

Members#

x
Type: float
The X coordinate of the point in pixels.
y
Type: float
The Y coordinate of the point in pixels.

Remarks#

This structure represents the X and Y coordinates of one point in 2D space.

Defined in: nvAR_defs.h

NvAR_Point3f#

typedef struct NvAR_Point3f {
    float x, y, z;
} NvAR_Point3f;

Members#

x
Type: float
The X coordinate of the point in pixels.
y
Type: float
The Y coordinate of the point in pixels.
z
Type: float
The Z coordinate of the point in pixels.

Remarks#

This structure represents the X, Y, Z coordinates of one point in 3D space.

Defined in: nvAR_defs.h

NvAR_Quaternion#

struct NvAR_Quaternion {
    float x, y, z, w;
};

Members#

x
Type: float
The first coefficient of the complex part of the quaternion.
y
Type: float
The second coefficient of the complex part of the quaternion.
z
Type: float
The third coefficient of the complex part of the quaternion.
w
Type: float
The scalar coefficient of the quaternion.

Remarks#

This structure represents the coefficients in the quaternion that are expressed in the following equation: q = w + xi + yj + zk

Defined in: nvAR_defs.h

NvAR_Rect#

typedef struct NvAR_Rect {
    float x, y, width, height;
} NvAR_Rect;

Members#

x
Type: float
The X coordinate of the top left corner of the bounding box in pixels.
y
Type: float
The Y coordinate of the top left corner of the bounding box in pixels.
width
Type: float
The width of the bounding box in pixels.
height
Type: float
The height of the bounding box in pixels.

Remarks#

This structure represents the position and size of a rectangular 2D bounding box.

Defined in: nvAR_defs.h

NvAR_RenderingParams#

struct NvAR_RenderingParams {
    NvAR_Frustum frustum;
    NvAR_Quaternion rotation;
    NvAR_Vec3<float> translation;
};

Members#

frustum
Type: NvAR_Frustum
The camera viewing frustum for an orthographic camera.
rotation
Type: NvAR_Quaternion
The rotation of the camera relative to the mesh.
translation
Type: NvAR_Vec3<float>
The translation of the camera relative to the mesh.

Remarks#

This structure defines the parameters that are used to draw a 3D face mesh in a window on the computer screen so that the face mesh is aligned with the corresponding video frame. The projection matrix is constructed from the frustum parameter, and the model view matrix is constructed from the rotation and translation parameters.

Defined in: nvAR_defs.h

NvAR_Vector2f#

typedef struct NvAR_Vector2f {
    float x, y;
} NvAR_Vector2f;

Members#

x
Type: float
The X component of the 2D vector.
y
Type: float
The Y component of the 2D vector.

Remarks#

This structure represents a 2D vector.

Defined in: nvAR_defs.h

NvAR_Vector3f#

typedef struct NvAR_Vector3f {
    float vec[3];
} NvAR_Vector3f;

Members#

vec
Type: float array of size 3
A vector of size 3.

Remarks#

This structure represents a 3D vector.

Defined in: nvAR_defs.h

NvAR_Vector3u16#

typedef struct NvAR_Vector3u16 {
    unsigned short vec[3];
} NvAR_Vector3u16;

Members#

vec
Type: unsigned short array of size 3
A vector of size 3.

Remarks#

This structure represents a 3D vector.

Defined in: nvAR_defs.h

NvAR_SpeakerData (Deprecated)#

Deprecated since version 1.1.0.1: NvAR_SpeakerData is no longer supported. Use NvAR_LipSyncRegionData instead.

typedef struct NvAR_SpeakerData {
    const float* audio_frame_data;
    size_t audio_frame_size;
    NvAR_Rect region;
    uint8_t region_type;
    float bypass;
} NvAR_SpeakerData;

Members#

audio_frame_data
Type: const float*
Pointer to a buffer containing driving audio. The audio is assumed to be mono and in float format.
audio_frame_size
Type: size_t
Number of audio samples in the audio frame.
region
Type: NvAR_Rect
Region that contains the speaker’s face.
region_type
Type: uint8_t
Flag that indicates the type of region. 0 = ROI, the feature should perform face detection within the ROI. 1 = face box, the feature should skip face detection.
bypass
Type: float
Value in the range 0–1 that can suppress speaker animation by reducing the opacity of the animated face. When the value is 0, the speaker’s face is animated according to the feature’s internal logic. As the value increases, the opacity of the animated face is reduced, blending with the original image. When the value is 1, the face is not animated.

Defined in: nvAR_defs.h

Remarks#

This structure represents the input data that is used to animate a speaker’s face in a video frame.

NvAR_LipSyncRegion#

typedef struct NvAR_LipSyncRegion {
    NvAR_Rect bbox;
    uint16_t tracking_id;
    int16_t audio_id;
    float bypass;
    uint8_t region_type;
    uint8_t is_speaking;
} NvAR_LipSyncRegion;

Members#

bbox
Type: NvAR_Rect
Bounding box that contains the speaker’s face.
tracking_id
Type: uint16_t
Tracking ID that uniquely identifies the speaker across frames.
audio_id
Type: int16_t
Reserved for future use.
bypass
Type: float
Value in the range 0–1 that can reduce the output opacity. When the value is 0, the speaker’s face is animated according to the feature’s internal logic. As the value increases, the opacity of the animated face is reduced, blending with the original image. When the value is 1, the face is not animated.
region_type
Type: uint8_t
Flag that indicates the type of region. 0 = ROI, the feature should perform face detection within the ROI. 1 = face box, the feature should skip face detection.
is_speaking
Type: uint8_t
Flag that indicates whether the speaker is currently speaking. When set to a non-zero value, the region is considered to be speaking and LipSync animation is applied.

Remarks#

This structure represents a single speaker region used as input to the LipSync feature. It replaces the deprecated NvAR_SpeakerData structure with additional multi-speaker support through tracking IDs and a speaking flag.

Defined in: nvARLipSync.h

NvAR_LipSyncRegionData#

typedef struct NvAR_LipSyncRegionData {
    NvAR_LipSyncRegion* regions;
    uint8_t num_regions;
} NvAR_LipSyncRegionData;

Members#

regions
Type: NvAR_LipSyncRegion*
Pointer to an array of NvAR_LipSyncRegion structures that define the speaker regions in the frame.
num_regions
Type: uint8_t
The number of regions in the array.

Remarks#

This structure specifies the regions and per-region settings for the LipSync feature. It replaces the deprecated NvAR_SpeakerData input. When this input is not set (nullptr), the LipSync feature operates in single full-frame region mode.

Defined in: nvARLipSync.h

NvAR_LipSyncActivation#

typedef struct NvAR_LipSyncActivation {
    float strength;
    float center_x;
    float center_y;
    float size;
} NvAR_LipSyncActivation;

Members#

strength
Type: float
Value in the range 0–1 representing the LipSync activation strength. When the value is 0, the original face was copied directly to the output without modification. When the value is 1, the original face was completely replaced by the animated face in the output.
center_x
Type: float
The X coordinate of the center of the face in pixels.
center_y
Type: float
The Y coordinate of the center of the face in pixels.
size
Type: float
The face size in pixels.

Remarks#

This structure is returned as an output of the LipSync feature and provides detailed information about the activation strength and the location and size of the face that was animated.

Defined in: nvARLipSync.h

NvAR_AudioFrame#

typedef struct NvAR_AudioFrame {
    const float* audio_data;
    uint32_t num_samples;
    int32_t audio_id;
} NvAR_AudioFrame;

Members#

audio_data
Type: const float*
Pointer to audio samples in floating-point format, ranging from -1.0 to 1.0.
num_samples
Type: uint32_t
Number of audio samples in this frame.
audio_id
Type: int32_t
Unique identifier for this audio track.

Defined in: nvARActiveSpeakerDetection.h

Remarks#

This structure represents a single audio frame from one audio track for Active Speaker Detection.

NvAR_AudioFrameData#

typedef struct NvAR_AudioFrameData {
    NvAR_AudioFrame* audio_frames;
    uint32_t num_audio_channels;
} NvAR_AudioFrameData;

Members#

audio_frames
Type: NvAR_AudioFrame*
Array of audio frames, one per audio channel or track.
num_audio_channels
Type: uint32_t
Number of audio channels or tracks in the audio_frames array.

Defined in: nvARActiveSpeakerDetection.h

Remarks#

This structure contains audio frame data for all audio tracks to be processed by Active Speaker Detection.

NvAR_ActiveAudioIds#

typedef struct NvAR_ActiveAudioIds {
    uint32_t* active_audio_ids;
    uint32_t num_active_audio_ids;
} NvAR_ActiveAudioIds;

Members#

active_audio_ids
Type: uint32_t*
Array of audio IDs that are currently active and should be processed.
num_active_audio_ids
Type: uint32_t
Number of active audio IDs in the active_audio_ids array.

Defined in: nvARActiveSpeakerDetection.h

Remarks#

This structure specifies which audio tracks are active and should be considered for speaker detection.

If the speaker is detected, the flag is_speaking for the corresponding audio track is set to 1.

NvAR_SpeakerTrackingBBox#

typedef struct NvAR_SpeakerTrackingBBox {
    NvAR_Rect bbox;
    uint16_t tracking_id;
    int16_t audio_id;
    float confidence;
    uint8_t is_speaking;
} NvAR_SpeakerTrackingBBox;

Members#

bbox
Type: NvAR_Rect
Bounding box of the detected face (x, y, width, height).
tracking_id
Type: uint16_t
Unique identifier for a person tracked across frames.
audio_id
Type: int16_t
ID of the audio track associated with this speaker, or -1 if no audio is associated.
confidence
Type: float
Detection confidence score for this face.
is_speaking
Type: uint8_t
Flag indicating whether this person is currently speaking (1) or not (0).

Defined in: nvARActiveSpeakerDetection.h

Remarks#

This structure represents a single tracked face with speaker detection information.

NvAR_ActiveSpeakerTrackingData#

typedef struct NvAR_ActiveSpeakerTrackingData {
    NvAR_SpeakerTrackingBBox* boxes;
    uint8_t num_boxes;
    uint8_t max_boxes;
} NvAR_ActiveSpeakerTrackingData;

Members#

boxes
Type: NvAR_SpeakerTrackingBBox*
Array of tracked faces with speaker detection information.
num_boxes
Type: uint8_t
Number of valid entries in the boxes array.
max_boxes
Type: uint8_t
Maximum number of boxes that can be stored in the array.

Defined in: nvARActiveSpeakerDetection.h

Remarks#

The bounding box array must be pre-allocated by the user. The user owns the memory pointed to by boxes`. The value of ``max_boxes must correspond to the size of the array; num_boxes is set by the feature.

This structure contains the output data from Active Speaker Detection, including all tracked faces and their speaking status.