Is this page helpful?

Structures#

This section provides information about the structures in the AR SDK APIs. The structures are defined in the following header files:

nvAR.h
nvAR_defs.h (mostly data types)

NvAR_BBoxes#

struct NvAR_BBoxes {
    NvAR_Rect *boxes;
    uint8_t num_boxes;
    uint8_t max_boxes;
};

Members#

boxes: Type: NvAR_Rect*

Pointer to an array of bounding boxes that are allocated by the user.
num_boxes: Type: uint8_t

The number of bounding boxes in the array.
max_boxes: Type: uint8_t

The maximum number of bounding boxes that can be stored in the array as defined by the user.

Remarks#

This structure is returned as the output of the face detection feature.

Defined in: nvAR_defs.h

NvAR_TrackingBBox#

struct NvAR_TrackingBBox {
    NvAR_Rect bbox;
    uint16_t tracking_id;
};

Members#

bbox: Type: NvAR_Rect

Bounding box that is allocated by the user.
tracking_id: Type: uint16_t

The Tracking ID assigned to the bounding box by Multi-Person Tracking.

Remarks#

This structure is returned as the output of the body pose feature when multi-person tracking is enabled.

Defined in: nvAR_defs.h

NvAR_TrackingBBoxes#

struct NvAR_TrackingBBoxes {
    NvAR_TrackingBBox *boxes;
    uint8_t num_boxes;
    uint8_t max_boxes;
};

Members#

boxes: Type: NvAR_TrackingBBox *

Pointer to an array of tracking bounding boxes that are allocated by the user.
num_boxes: Type: uint8_t

The number of bounding boxes in the array.
max_boxes: Type: uint8_t

The maximum number of bounding boxes that can be stored in the array as defined by the user.

Remarks#

This structure is returned as the output of the body pose feature when multi-person tracking is enabled.

Defined in: nvAR_defs.h

NvAR_FaceMesh#

struct NvAR_FaceMesh {
    NvAR_Vec3<float> *vertices;
    size_t num_vertices;
    NvAR_Vec3<unsigned short> *tvi;
    size_t num_triangles;
};

Members#

vertices: Type: NvAR_Vec3<float>*

Pointer to an array of vectors that represent the mesh 3D vertex positions.
num_vertices: Type: size_t

The number of vertices in the array pointed to by the vertices parameter.
tvi: Type: NvAR_Vec3<unsigned short> *

Pointer to an array of vectors that represent the mesh triangle’s vertex indices.
num_triangles: Type: size_t

The number of mesh triangles.

Remarks#

This structure is returned as an output of the Mesh Tracking feature.

Defined in: nvAR_defs.h

NvAR_Frustum#

struct NvAR_Frustum {
    float left = -1.0f;
    float right = 1.0f;
    float bottom = -1.0f;
    float top = 1.0f;
};

Members#

left: Type: float

The X coordinate of the top-left corner of the viewing frustum.
right: Type: float

The X coordinate of the bottom-right corner of the viewing frustum.
bottom: Type: float

The Y coordinate of the bottom-right corner of the viewing frustum.
top: Type: float

The Y coordinate of the top-left corner of the viewing frustum.

Remarks#

This structure represents a camera viewing frustum for an orthographic camera. As a result, it contains only the left, the right, the top, and the bottom coordinates in pixels. It does not contain a near or a far clipping plane.

Defined in: nvAR_defs.h

NvAR_FeatureHandle#

typedef struct nvAR_Feature *NvAR_FeatureHandle;

Remarks#

This type defines the handle of a feature that is defined by the SDK. It is used to reference the feature at runtime when the feature is executed and must be destroyed when it is no longer required.

Defined in: nvAR_defs.h

NvAR_Point2f#

typedef struct NvAR_Point2f {
    float x, y;
} NvAR_Point2f;

Members#

x: Type: float

The X coordinate of the point in pixels.
y: Type: float

The Y coordinate of the point in pixels.

Remarks#

This structure represents the X and Y coordinates of one point in 2D space.

Defined in: nvAR_defs.h

NvAR_Point3f#

typedef struct NvAR_Point3f {
    float x, y, z;
} NvAR_Point3f;

Members#

x: Type: float

The X coordinate of the point in pixels.
y: Type: float

The Y coordinate of the point in pixels.
z: Type: float

The Z coordinate of the point in pixels.

Remarks#

This structure represents the X, Y, Z coordinates of one point in 3D space.

Defined in: nvAR_defs.h

NvAR_Quaternion#

struct NvAR_Quaternion {
    float x, y, z, w;
};

Members#

x: Type: float

The first coefficient of the complex part of the quaternion.
y: Type: float

The second coefficient of the complex part of the quaternion.
z: Type: float

The third coefficient of the complex part of the quaternion.
w: Type: float

The scalar coefficient of the quaternion.

Remarks#

This structure represents the coefficients in the quaternion that are expressed in the following equation: q = w + xi + yj + zk

Defined in: nvAR_defs.h

NvAR_Rect#

typedef struct NvAR_Rect {
    float x, y, width, height;
} NvAR_Rect;

Members#

x: Type: float

The X coordinate of the top left corner of the bounding box in pixels.
y: Type: float

The Y coordinate of the top left corner of the bounding box in pixels.
width: Type: float

The width of the bounding box in pixels.
height: Type: float

The height of the bounding box in pixels.

Remarks#

This structure represents the position and size of a rectangular 2D bounding box.

Defined in: nvAR_defs.h

NvAR_RenderingParams#

struct NvAR_RenderingParams {
    NvAR_Frustum frustum;
    NvAR_Quaternion rotation;
    NvAR_Vec3<float> translation;
};

Members#

frustum: Type: NvAR_Frustum

The camera viewing frustum for an orthographic camera.
rotation: Type: NvAR_Quaternion

The rotation of the camera relative to the mesh.
translation: Type: NvAR_Vec3<float>

The translation of the camera relative to the mesh.

Remarks#

This structure defines the parameters that are used to draw a 3D face mesh in a window on the computer screen so that the face mesh is aligned with the corresponding video frame. The projection matrix is constructed from the frustum parameter, and the model view matrix is constructed from the rotation and translation parameters.

Defined in: nvAR_defs.h

NvAR_Vector2f#

typedef struct NvAR_Vector2f {
    float x, y;
} NvAR_Vector2f;

Members#

x: Type: float

The X component of the 2D vector.
y: Type: float

The Y component of the 2D vector.

Remarks#

This structure represents a 2D vector.

Defined in: nvAR_defs.h

NvAR_Vector3f#

typedef struct NvAR_Vector3f {
    float vec[3];
} NvAR_Vector3f;

Members#

vec: Type: float array of size 3

A vector of size 3.

Remarks#

This structure represents a 3D vector.

Defined in: nvAR_defs.h

NvAR_Vector3u16#

typedef struct NvAR_Vector3u16 {
    unsigned short vec[3];
} NvAR_Vector3u16;

Members#

vec: Type: unsigned short array of size 3

A vector of size 3.

Remarks#

This structure represents a 3D vector.

Defined in: nvAR_defs.h

NvAR_SpeakerData (Deprecated)#

Deprecated since version 1.1.0.1: NvAR_SpeakerData is no longer supported. Use NvAR_LipSyncRegionData instead.

typedef struct NvAR_SpeakerData {
    const float* audio_frame_data;
    size_t audio_frame_size;
    NvAR_Rect region;
    uint8_t region_type;
    float bypass;
} NvAR_SpeakerData;

Members#

audio_frame_data: Type: const float*

Pointer to a buffer containing driving audio. The audio is assumed to be mono and in float format.
audio_frame_size: Type: size_t

Number of audio samples in the audio frame.
region: Type: NvAR_Rect

Region that contains the speaker’s face.
region_type: Type: uint8_t

Flag that indicates the type of region. 0 = ROI, the feature should perform face detection within the ROI. 1 = face box, the feature should skip face detection.
bypass: Type: float

Value in the range 0–1 that can suppress speaker animation by reducing the opacity of the animated face. When the value is 0, the speaker’s face is animated according to the feature’s internal logic. As the value increases, the opacity of the animated face is reduced, blending with the original image. When the value is 1, the face is not animated.

Defined in: nvAR_defs.h

Remarks#

This structure represents the input data that is used to animate a speaker’s face in a video frame.

NvAR_LipSyncRegion#

typedef struct NvAR_LipSyncRegion {
    NvAR_Rect bbox;
    uint16_t tracking_id;
    int16_t audio_id;
    float bypass;
    uint8_t region_type;
    uint8_t is_speaking;
} NvAR_LipSyncRegion;

Members#

bbox: Type: NvAR_Rect

Bounding box that contains the speaker’s face.
tracking_id: Type: uint16_t

Tracking ID that uniquely identifies the speaker across frames.
audio_id: Type: int16_t

Reserved for future use.
bypass: Type: float

Value in the range 0–1 that can reduce the output opacity. When the value is 0, the speaker’s face is animated according to the feature’s internal logic. As the value increases, the opacity of the animated face is reduced, blending with the original image. When the value is 1, the face is not animated.
region_type: Type: uint8_t

Flag that indicates the type of region. 0 = ROI, the feature should perform face detection within the ROI. 1 = face box, the feature should skip face detection.
is_speaking: Type: uint8_t

Flag that indicates whether the speaker is currently speaking. When set to a non-zero value, the region is considered to be speaking and LipSync animation is applied.

Remarks#

This structure represents a single speaker region used as input to the LipSync feature. It replaces the deprecated NvAR_SpeakerData structure with additional multi-speaker support through tracking IDs and a speaking flag.

Defined in: nvARLipSync.h

NvAR_LipSyncRegionData#

typedef struct NvAR_LipSyncRegionData {
    NvAR_LipSyncRegion* regions;
    uint8_t num_regions;
} NvAR_LipSyncRegionData;

Members#

regions: Type: NvAR_LipSyncRegion*

Pointer to an array of NvAR_LipSyncRegion structures that define the speaker regions in the frame.
num_regions: Type: uint8_t

The number of regions in the array.

Remarks#

This structure specifies the regions and per-region settings for the LipSync feature. It replaces the deprecated NvAR_SpeakerData input. When this input is not set (nullptr), the LipSync feature operates in single full-frame region mode.

Defined in: nvARLipSync.h

NvAR_LipSyncActivation#

typedef struct NvAR_LipSyncActivation {
    float strength;
    float center_x;
    float center_y;
    float size;
} NvAR_LipSyncActivation;

Members#

strength: Type: float

Value in the range 0–1 representing the LipSync activation strength. When the value is 0, the original face was copied directly to the output without modification. When the value is 1, the original face was completely replaced by the animated face in the output.
center_x: Type: float

The X coordinate of the center of the face in pixels.
center_y: Type: float

The Y coordinate of the center of the face in pixels.
size: Type: float

The face size in pixels.

Remarks#

This structure is returned as an output of the LipSync feature and provides detailed information about the activation strength and the location and size of the face that was animated.

Defined in: nvARLipSync.h

NvAR_AudioFrame#

typedef struct NvAR_AudioFrame {
    const float* audio_data;
    uint32_t num_samples;
    int32_t audio_id;
} NvAR_AudioFrame;

Members#

audio_data: Type: const float*

Pointer to audio samples in floating-point format, ranging from -1.0 to 1.0.
num_samples: Type: uint32_t

Number of audio samples in this frame.
audio_id: Type: int32_t

Unique identifier for this audio track.

Defined in: nvARActiveSpeakerDetection.h

Remarks#

This structure represents a single audio frame from one audio track for Active Speaker Detection.

NvAR_AudioFrameData#

typedef struct NvAR_AudioFrameData {
    NvAR_AudioFrame* audio_frames;
    uint32_t num_audio_channels;
} NvAR_AudioFrameData;

Members#

audio_frames: Type: NvAR_AudioFrame*

Array of audio frames, one per audio channel or track.
num_audio_channels: Type: uint32_t

Number of audio channels or tracks in the audio_frames array.

Defined in: nvARActiveSpeakerDetection.h

Remarks#

This structure contains audio frame data for all audio tracks to be processed by Active Speaker Detection.

NvAR_ActiveAudioIds#

typedef struct NvAR_ActiveAudioIds {
    uint32_t* active_audio_ids;
    uint32_t num_active_audio_ids;
} NvAR_ActiveAudioIds;

Members#

active_audio_ids: Type: uint32_t*

Array of audio IDs that are currently active and should be processed.
num_active_audio_ids: Type: uint32_t

Number of active audio IDs in the active_audio_ids array.

Defined in: nvARActiveSpeakerDetection.h

Remarks#

This structure specifies which audio tracks are active and should be considered for speaker detection.

If the speaker is detected, the flag is_speaking for the corresponding audio track is set to 1.

NvAR_SpeakerTrackingBBox#

typedef struct NvAR_SpeakerTrackingBBox {
    NvAR_Rect bbox;
    uint16_t tracking_id;
    int16_t audio_id;
    float confidence;
    uint8_t is_speaking;
} NvAR_SpeakerTrackingBBox;

Members#

bbox: Type: NvAR_Rect

Bounding box of the detected face (x, y, width, height).
tracking_id: Type: uint16_t

Unique identifier for a person tracked across frames.
audio_id: Type: int16_t

ID of the audio track associated with this speaker, or -1 if no audio is associated.
confidence: Type: float

Detection confidence score for this face.
is_speaking: Type: uint8_t

Flag indicating whether this person is currently speaking (1) or not (0).

Defined in: nvARActiveSpeakerDetection.h

Remarks#

This structure represents a single tracked face with speaker detection information.

NvAR_ActiveSpeakerTrackingData#

typedef struct NvAR_ActiveSpeakerTrackingData {
    NvAR_SpeakerTrackingBBox* boxes;
    uint8_t num_boxes;
    uint8_t max_boxes;
} NvAR_ActiveSpeakerTrackingData;

Members#

boxes: Type: NvAR_SpeakerTrackingBBox*

Array of tracked faces with speaker detection information.
num_boxes: Type: uint8_t

Number of valid entries in the boxes array.
max_boxes: Type: uint8_t

Maximum number of boxes that can be stored in the array.

Defined in: nvARActiveSpeakerDetection.h

Remarks#

The bounding box array must be pre-allocated by the user. The user owns the memory pointed to by boxes`. The value of ``max_boxes must correspond to the size of the array; num_boxes is set by the feature.

This structure contains the output data from Active Speaker Detection, including all tracked faces and their speaking status.