Vision and AI Analytics#

Tokkio Vision leverages real-time computer vision processing to facilitate highly realistic interactions with digital avatars. By analyzing video streams in real-time, Tokkio Vision enhances the avatar’s ability to respond accurately to the user, creating a more immersive and engaging experience.

The architecture can be summarized as follows:

Architecture_Diagram
  • Streaming Pipeline
    • The streaming pipeline captures video from the user’s webcam and transmits it to the cloud.

  • The Tokkio Vision pipeline is defined by 3 microservice
    • Vision AI microservice Performs video inference to extract body poses and face bounding boxes from the video stream.

    • eMDX microservice Analyzes metadata from the Vision AI microservice, providing alerts on user presence and attention levels.

    • eMDX API microservice Manages the persistence and retrieval of metadata.

  • Audio Pipeline
    • The Chat controller action server is an audio inference microservice. It performs, Voice Activity Detection, ASR, and TTS and provides UMIM-compliant events

  • Interaction management
    • Tokkio UMIM Action Server ensures compliance with UMIM events, facilitating seamless integration and management of these events within Tokkio

    • Chat Engine drives the avatar interaction by reacting to vision and speech user events and performing bot actions using UMIM action events.