src/jarvis_proto/jarvis_cv.proto

service JarvisVision

The Jarvis Vision service provides methods for obtaining inference results for various vision models.

rpc GazeResponse GetGaze(GazeRequest)

Given a GazeRequest for Gaze inference, outputs a GazeResponse.

rpc FaceDetectResponse GetFaceDetect(FaceDetectRequest)

Given a FaceDetectRequest for FaceDetect inference, outputs a FaceDetectResponse.

rpc FacialLandmarksResponse GetFacialLandmarks(FacialLandmarksRequest)

Given a FacialLandmarksRequest for FacialLandmarks inference, outputs a FacialLandmarksResponse.

rpc BodyPoseResponse GetBodyPose(BodyPoseRequest)

Given a BodyPoseRequest for BodyPose inference, outputs a BodyPoseResponse.

rpc EmotionResponse GetEmotion(EmotionRequest)

Given a EmotionRequest for Emotion inference, outputs a EmotionResponse.

rpc HeadPoseResponse GetHeadPose(HeadPoseRequest)

Given a HeadPoseRequest for HeadPose inference, outputs a HeadPoseResponse.

rpc UserResponse GetUserAttributes(UserRequest)

Given a UserRequest for getting user data, outputs a UserResponse.

message BodyPose

BodyPose datastructure to be returned when there is BodyposeRequest.

BodyPose.Joint joints (repeated)
message BodyPose.Joint

Joint object containing location descriptor and x,y coordinate.

BodyPose.JointDescriptor descriptor
int32 x
int32 y
message BodyPoseRequest

Request for BodyPose inference needs image. Optionally, provide imageID which will be returned in the response.

Image is expected in BGR format in HWC.

Data image

Input image frame

uint64 imageID

Optionally provide imageID which will be mirrored in response

message BodyPoseResponse

Response for BodyPose inference outputs bounding boxes of faces.

BodyPose poses(repeated)

A list of output poses

uint64 imageID

ID from request

message BoundingBox

Bounding box datastructure expressed as (x,y) coordinate for top left and (w,h) for width and height with (x+w, y+h) as bottom right coordinate.

int32 x

Top left x-coordinate

int32 y

Top left y-coordinate

int32 w

Width such that bottom right x-coordinate = x + w

int32 h

Height such that bottom right y-coordinate = y + h

message Data

Generic data block that can hold images or tensors.

bytes buffer

Buffer of bytes for data.

int32 shape(repeated)

Shape of data used for deserialization.

DataType dtype

Datatype of buffer for deserialization.

message Emotion

Emotion datastructure to be returned when there is EmotionRequest.

BoundingBox bbox
Emotion.EmotionDescriptor emotion
message EmotionRequest

Request for Emotion inference needs image. Optionally, provide imageID which will be returned in the response.

Image is expected in BGR format in HWC.

Data image

Input image frame

uint64 imageID

Optionally provide imageID which will be mirrored in response

message EmotionResponse

Response for Emotion inference outputs list of emotions for every face detected.

Emotion emotions(repeated)

A list of output poses

uint64 imageID

ID from request

message FaceDetectRequest

Request for FaceDetect inference needs image. Optionally, provide imageID which will be returned in the response.

Image is expected in BGR format in HWC.

Data image

Input image frame

uint64 imageID

Optionally provide imageID which will be mirrored in response.

message FaceDetectResponse

Response for FaceDetect inference outputs bounding boxes of faces.

BoundingBox bbox(repeated)

A list of output face bounding boxes

uint64 imageID

ID from request.

message FacialLandmarksRequest

Request for FacialLandmarks inference needs image. Optionally, provide imageID which will be returned in the response. Optionally, user can provide face bounding boxes to run inference for FacialLandmarks in specific regions.

Image is expected in BGR format in HWC.

Data image

Input image frame

uint64 imageID

Optionally provide imageID which will be mirrored in response.

BoundingBox face_bbox(repeated)

Optional input

message FacialLandmarksResponse

Response for FacialLandmarks inference outputs landmarks of (x,y) coorindates for each face.

Data landmarks(repeated)

A list of output facial landmarks points

uint64 imageID

ID from request.

message Gaze

Gaze datastructure to be returned when there is GazeRequest.

double x

x-coordinate of the gaze point in camera space (millimeter)

double y

y-coordinate of the gaze point in camera space (millimeter)

double z

z-coordinate of the gaze point in camera space (millimeter)

double theta

Horizontal angle of the gaze point in camera space (radians)

double phi

Vertical angle of the gaze point in camera space (radians)

message GazeRequest

Request for Gaze inference needs image. Optionally, provide imageID which will be returned in the response. Optionally, user can provide face bounding boxes to run inference for Gaze in specific regions. Optionally, user can provide landmarks of (x,y) coordinates for each face to run inference for Gaze in specific regions.

Image is expected in BGR format in HWC.

Data image

Input image frame

uint64 imageID

Optionally provide imageID which will be mirrored in response.

BoundingBox face_bbox(repeated)

Optional input

Data landmarks(repeated)

Optional input

message GazeResponse

Response for Gaze inference outputs Gazes for each person.

Gaze gaze(repeated)

A list of output gaze values

uint64 imageID

ID from request.

message Head

Head datastructure to be returned when there is Headpose.

double x

x-coordinate of the head center point in camera space (millimeter)

double y

y-coordinate of the head center point in camera space (millimeter)

double z

z-coordinate of the head center point in camera space (millimeter)

double pitch

Pitch angle of the head center point in camera space (degrees)

double yaw

Yaw angle of the head center point in camera space (degrees)

double roll

Roll angle of the head center point in camera space (degrees)

message HeadPoseRequest

Request for HeadPose inference needs image and camera parameters. Optionally, provide imageID which will be returned in the response.

Image is expected in BGR format in HWC.

Data image

Input image frame

Data cam_matrix

camera matrix

Data dist_coeffs

camera distortion coefficients

uint64 imageID

Optionally provide imageID which will be mirrored in response

message HeadPoseResponse

Response for HeadPose inference outputs points in 3D space.

Head headposes(repeated)

A list of output tuples for the points in 3D space

uint64 imageID

ID from request

message UserRequest

Request for User inference needs image and point cloud. Optionally, provide imageID which will be returned in the response.

Image is expected in BGR format in HWC.

Data image

Input image frame

Data cam_matrix

camera matrix

Data dist_coeffs

camera distortion coefficients

uint64 imageID

Optionally provide imageID which will be mirrored in response

message UserResponse

Response for User objects.

Users users(repeated)

A list of output tuples for the points in 3D space

uint64 imageID

ID from request

message Users

User datastructure

BoundingBox face

Users Face Detect Result

Data landmarks

Users Facial Landmarks Result

Gaze gaze

Users Gaze Result

Head head

Users Head Result

Emotion emotion

Users Emotion Result

enum BodyPose.JointDescriptor

Descriptors for Joints. Default is None.

enumerator NONE = 0
enumerator NOSE = 1
enumerator NECK = 2
enumerator RIGHT_SHOULDER = 3
enumerator RIGHT_ELBOW = 4
enumerator RIGHT_WRIST = 5
enumerator LEFT_SHOULDER = 6
enumerator LEFT_ELBOW = 7
enumerator LEFT_WRIST = 8
enumerator RIGHT_HIP = 9
enumerator RIGHT_KNEE = 10
enumerator RIGHT_ANKLE = 11
enumerator LEFT_HIP = 12
enumerator LEFT_KNEE = 13
enumerator LEFT_ANKLE = 14
enumerator RIGHT_EYE = 15
enumerator LEFT_EYE = 16
enumerator RIGHT_EAR = 17
enumerator LEFT_EAR = 18
enum DataType

Datatype specifications for data block. Default = 0 = FLOAT32.

enumerator FLOAT32 = 0

32 bit float

enumerator INT32 = 1

32 bit integer

enumerator FLOAT64 = 2

64 bit float

enumerator UINT8 = 3

8 bit integer

enum Emotion.EmotionDescriptor
enumerator NONE = 0
enumerator NEUTRAL = 1
enumerator HAPPY = 2
enumerator SURPRISE = 3
enumerator DISGUST = 4
enumerator SCREAM = 5