Vision and AI Analytics#
Tokkio Vision leverages real-time computer vision processing to facilitate highly realistic interactions with digital avatars. By analyzing video streams in real-time, Tokkio Vision enhances the avatar’s ability to respond accurately to the user, creating a more immersive and engaging experience.
The architecture can be summarized as follows:
- Streaming Pipeline
The streaming pipeline captures video from the user’s webcam and transmits it to the cloud.
- The Tokkio Vision pipeline is defined by 3 microservice
Vision AI microservice Performs video inference to extract body poses and face bounding boxes from the video stream.
eMDX microservice Analyzes metadata from the Vision AI microservice, providing alerts on user presence and attention levels.
eMDX API microservice Manages the persistence and retrieval of metadata.
- Audio Pipeline
The Chat controller action server is an audio inference microservice. It performs, Voice Activity Detection, ASR, and TTS and provides UMIM-compliant events
- Interaction management
Tokkio UMIM Action Server ensures compliance with UMIM events, facilitating seamless integration and management of these events within Tokkio
Chat Engine drives the avatar interaction by reacting to vision and speech user events and performing bot actions using UMIM action events.