jarvis_cv.proto¶

service JarvisVision¶

The Jarvis Vision service provides methods for obtaining inference results for various vision models.

rpc GazeResponse GetGaze(GazeRequest): Given a GazeRequest for Gaze inference, outputs a GazeResponse.

rpc FaceDetectResponse GetFaceDetect(FaceDetectRequest): Given a FaceDetectRequest for FaceDetect inference, outputs a FaceDetectResponse.

rpc FacialLandmarksResponse GetFacialLandmarks(FacialLandmarksRequest): Given a FacialLandmarksRequest for FacialLandmarks inference, outputs a FacialLandmarksResponse.

rpc BodyPoseResponse GetBodyPose(BodyPoseRequest): Given a BodyPoseRequest for BodyPose inference, outputs a BodyPoseResponse.

rpc EmotionResponse GetEmotion(EmotionRequest): Given a EmotionRequest for Emotion inference, outputs a EmotionResponse.

rpc HeadPoseResponse GetHeadPose(HeadPoseRequest): Given a HeadPoseRequest for HeadPose inference, outputs a HeadPoseResponse.

rpc UserResponse GetUserAttributes(UserRequest): Given a UserRequest for getting user data, outputs a UserResponse.

message BodyPose¶

BodyPose datastructure to be returned when there is BodyposeRequest.

BodyPose.Joint joints (repeated)

message BodyPose.Joint

Joint object containing location descriptor and x,y coordinate.

BodyPose.JointDescriptor descriptor

int32 x¶

int32 y¶

message BodyPoseRequest¶

Request for BodyPose inference needs image. Optionally, provide imageID which will be returned in the response.

Image is expected in BGR format in HWC.

Data image¶: Input image frame

uint64 imageID¶: Optionally provide imageID which will be mirrored in response

message BodyPoseResponse¶

Response for BodyPose inference outputs bounding boxes of faces.

BodyPose poses(repeated)¶: A list of output poses

uint64 imageID¶: ID from request

message BoundingBox¶

Bounding box datastructure expressed as (x,y) coordinate for top left and (w,h) for width and height with (x+w, y+h) as bottom right coordinate.

int32 x¶: Top left x-coordinate

int32 y¶: Top left y-coordinate

int32 w¶: Width such that bottom right x-coordinate = x + w

int32 h¶: Height such that bottom right y-coordinate = y + h

message Data¶

Generic data block that can hold images or tensors.

bytes buffer¶: Buffer of bytes for data.

int32 shape(repeated)¶: Shape of data used for deserialization.

DataType dtype¶: Datatype of buffer for deserialization.

message Emotion¶

Emotion datastructure to be returned when there is EmotionRequest.

BoundingBox bbox¶

Emotion.EmotionDescriptor emotion

message EmotionRequest¶

Request for Emotion inference needs image. Optionally, provide imageID which will be returned in the response.

Image is expected in BGR format in HWC.

Data image¶: Input image frame

uint64 imageID¶: Optionally provide imageID which will be mirrored in response

message EmotionResponse¶

Response for Emotion inference outputs list of emotions for every face detected.

Emotion emotions(repeated)¶: A list of output poses

uint64 imageID¶: ID from request

message FaceDetectRequest¶

Request for FaceDetect inference needs image. Optionally, provide imageID which will be returned in the response.

Image is expected in BGR format in HWC.

Data image¶: Input image frame

uint64 imageID¶: Optionally provide imageID which will be mirrored in response.

message FaceDetectResponse¶

Response for FaceDetect inference outputs bounding boxes of faces.

BoundingBox bbox(repeated)¶: A list of output face bounding boxes

uint64 imageID¶: ID from request.

message FacialLandmarksRequest¶

Request for FacialLandmarks inference needs image. Optionally, provide imageID which will be returned in the response. Optionally, user can provide face bounding boxes to run inference for FacialLandmarks in specific regions.

Image is expected in BGR format in HWC.

Data image¶: Input image frame

uint64 imageID¶: Optionally provide imageID which will be mirrored in response.

BoundingBox face_bbox(repeated)¶: Optional input

message FacialLandmarksResponse¶

Response for FacialLandmarks inference outputs landmarks of (x,y) coorindates for each face.

Data landmarks(repeated)¶: A list of output facial landmarks points

uint64 imageID¶: ID from request.

message Gaze¶

Gaze datastructure to be returned when there is GazeRequest.

double x¶: x-coordinate of the gaze point in camera space (millimeter)

double y¶: y-coordinate of the gaze point in camera space (millimeter)

double z¶: z-coordinate of the gaze point in camera space (millimeter)

double theta¶: Horizontal angle of the gaze point in camera space (radians)

double phi¶: Vertical angle of the gaze point in camera space (radians)

message GazeRequest¶

Request for Gaze inference needs image. Optionally, provide imageID which will be returned in the response. Optionally, user can provide face bounding boxes to run inference for Gaze in specific regions. Optionally, user can provide landmarks of (x,y) coordinates for each face to run inference for Gaze in specific regions.

Image is expected in BGR format in HWC.

Data image¶: Input image frame

uint64 imageID¶: Optionally provide imageID which will be mirrored in response.

BoundingBox face_bbox(repeated)¶: Optional input

Data landmarks(repeated)¶: Optional input

message GazeResponse¶

Response for Gaze inference outputs Gazes for each person.

Gaze gaze(repeated)¶: A list of output gaze values

uint64 imageID¶: ID from request.

message Head¶

Head datastructure to be returned when there is Headpose.

double x¶: x-coordinate of the head center point in camera space (millimeter)

double y¶: y-coordinate of the head center point in camera space (millimeter)

double z¶: z-coordinate of the head center point in camera space (millimeter)

double pitch¶: Pitch angle of the head center point in camera space (degrees)

double yaw¶: Yaw angle of the head center point in camera space (degrees)

double roll¶: Roll angle of the head center point in camera space (degrees)

message HeadPoseRequest¶

Request for HeadPose inference needs image and camera parameters. Optionally, provide imageID which will be returned in the response.

Image is expected in BGR format in HWC.

Data image¶: Input image frame

Data cam_matrix¶: camera matrix

Data dist_coeffs¶: camera distortion coefficients

uint64 imageID¶: Optionally provide imageID which will be mirrored in response

message HeadPoseResponse¶

Response for HeadPose inference outputs points in 3D space.

Head headposes(repeated)¶: A list of output tuples for the points in 3D space

uint64 imageID¶: ID from request

message UserRequest¶

Request for User inference needs image and point cloud. Optionally, provide imageID which will be returned in the response.

Image is expected in BGR format in HWC.

Data image¶: Input image frame

Data cam_matrix¶: camera matrix

Data dist_coeffs¶: camera distortion coefficients

uint64 imageID¶: Optionally provide imageID which will be mirrored in response

message UserResponse¶

Response for User objects.

Users users(repeated)¶: A list of output tuples for the points in 3D space

uint64 imageID¶: ID from request

message Users¶

User datastructure

BoundingBox face¶: Users Face Detect Result

Data landmarks¶: Users Facial Landmarks Result

Gaze gaze¶: Users Gaze Result

Head head¶: Users Head Result

Emotion emotion¶: Users Emotion Result

enum BodyPose.JointDescriptor

Descriptors for Joints. Default is None.

enumerator NONE = 0¶

enumerator NOSE = 1¶

enumerator NECK = 2¶

enumerator RIGHT_SHOULDER = 3¶

enumerator RIGHT_ELBOW = 4¶

enumerator RIGHT_WRIST = 5¶

enumerator LEFT_SHOULDER = 6¶

enumerator LEFT_ELBOW = 7¶

enumerator LEFT_WRIST = 8¶

enumerator RIGHT_HIP = 9¶

enumerator RIGHT_KNEE = 10¶

enumerator RIGHT_ANKLE = 11¶

enumerator LEFT_HIP = 12¶

enumerator LEFT_KNEE = 13¶

enumerator LEFT_ANKLE = 14¶

enumerator RIGHT_EYE = 15¶

enumerator LEFT_EYE = 16¶

enumerator RIGHT_EAR = 17¶

enumerator LEFT_EAR = 18¶

enum DataType

Datatype specifications for data block. Default = 0 = FLOAT32.

enumerator FLOAT32 = 0¶: 32 bit float

enumerator INT32 = 1¶: 32 bit integer

enumerator FLOAT64 = 2¶: 64 bit float

enumerator UINT8 = 3¶: 8 bit integer

enum Emotion.EmotionDescriptor

enumerator NONE = 0

enumerator NEUTRAL = 1¶

enumerator HAPPY = 2¶

enumerator SURPRISE = 3¶

enumerator DISGUST = 4¶

enumerator SCREAM = 5¶