Tokkio Retail#

Introduction#

Tokkio Retail reference application acts as an example for one of the many retail applications where Tokkio can be deployed. This application can then be customized based on the end user’s requirements. It is designed to facilitate avatar live interactions using popular Large Language Models (LLMs) and custom Retrieval-Augmented Generation (RAG) solutions.

  • Key Features

    • Multi-modal integration: An end user can interact with the Tokkio Retail application through a combination of both speech and touch.

    • Customization and Integration: Users can customize the application to meet specific requirements. This flexibility is detailed in the Customize the Retail Bot section.

    • Architectural Enhancements: The application showcases improvements such as usage of a custom Catalog RAG pipeline and different LLM models.

  • Benefits

    • Ease of Use: The application supports straightforward integration and customization, enabling users to tailor the system to specific use cases without extensive technical overhead.

    • Advanced Capabilities: With features like gesture generation and response streaming, Tokkio enhances user interaction quality and system responsiveness.

Source#

The source for this resource can be found on NGC

Minimum GPU Requirements#

The Retail reference workflow’s minimum GPU requirement for a 3 streams deployment is 4xT4 or a 4xL4 and for a 6 streams deployment is a 4xA10 for a default animated avatar rendering (OV rendering).

For Unreal Engine or A2F-2D based rendering options, please refer to Tokkio LLM-RAG - Unreal Engine and Tokkio LLM-RAG - A2F-2D respectively regarding minimum GPU requirements.

Deployment#

A sample UCS app spec for Retail app with OV rendering can be found in NVIDIA/ACE

Refer to the Integrating Customization changes with rebuild for build and deployment steps.

The helm chart of the sample LLM RAG workflow can be found in https://helm.ngc.nvidia.com/nvidia/ace/charts/ucs-tokkio-app-base-3-stream-retail-3d-ov-4.1.4.tgz

Refer to the Deployment for deployment instructions for both the Tokkio build and the Tokkio UI.

Tokkio Retail running on Tokkio UI

Customizing the Retail Bot#

The Tokkio Retail bot can be customized to connect with a custom catalog. If the catalog schema is maintained to be same as the default, no changes are needed for the plugin server, cart manager, UI or the UI server. Refer to Catalog Customization Catalog Customization section of the Catalog RAG for more information.

If a different schema is used for the Catalog RAG, modifications may be required on as-needed basis for all of the above mentioned microservices.

Publishing the customized bot#

Once the ACE bot is customized it can be pushed to NGC using:

$ ngc registry resource upload-version --source BOT_FOLDER_NAME targeted_ngc_path:version

Using customized Retail bot#

Specify the plugin server resource path in the UCS app yaml as shown below:

- name: chat-engine
type: ucf.svc.ace-agent.chat-engine
files:
    config_dir: <path_to_local_folder>
- name: chat-controller
type: ucf.svc.ace-agent.chat-controller
files:
    config_dir: <path_to_local_folder>
- name: plugin-server
type: ucf.svc.ace-agent.plugin-server
files:
    config_dir: <path_to_local_folder>

Alternatively, you can specify the NGC path for the plugin server in the params file as shown below. This can also be done in override values file and applied as an upgrade to an already deployed retail bot chart using helm upgrade.

chat-controller:
configNgcPath: "<ngc path for plugin resource>"
chat-engine:
  botConfigName: tokkio_food_ordering_bot_config.yaml
  configNgcPath: "<ngc path for plugin resource>"
plugin-server:
  configNgcPath: "<ngc path for plugin resource>"

Architecture#

The figure below provides an overview of the microservices used in Tokkio Retail and their interactions. Application specific microservices used in Tokkio Retail includes: ACE Agent Plugin Server, Catalog RAG, Cart Manager, UI server and UI front end. These application dependent micro services are orchestrated through ACE Agent and Chat Controller microservices.

Architecture_Diagram

All the microservices within Tokkio Retail interact with each other over HTTP through REST APIs. The user’s input query is passed onto ACE Agent which then orchestrates the ACE Agent plugin (and eventually the rest of the microservices) to fulfill the request.

Sample Conversation flow with the Retail bot#

The user enters the field of view (FOV) of the camera and has a microphone setup for speech input. The bot detects user presence and proceeds with the below conversation:

Bot: "Hi, I am Ben. How can I help you"

User: "What drinks do you have?"

Bot: "We have regular cola, cofee, diet cola and lemonade. What would you like to order?"

User: "Add a large lemonade"

Bot: "A large lemonade has been added to your order"

User: <Silent for some time>

Bot: "Feel free to ask me any questions about the menu"

User: "Checkout please"

Bot: "Your order will be available shortly. Thanks for visiting, good bye!"

Note that the users can use the touch inputs as well for ordering something in the Retail workflow.

Customizations can be performed for handling any of the supported operations (and add more operations). More information regarding this in the Customization section of this workflow.

ACE Agent Retail Plugin#

The Retail plugin for the ACE Agent plugin server is responsible for formulating required metadata and events for driving the conversation and the behavior of the bot. It interacts with different microservices to understand the intention of the user’s query, extracting necessary information about the items and finally formulating necessary information to control the bot behavior.

Responsibilities of the Retail Plugin include:

  • Determine what actions need to be taken to fulfill the user’s request based on query analysis (performed by Catalog RAG)

  • Fetch relevant information from the customer specific Catalog RAG.

  • Maintain the state of the user’s cart by interacting with Cart Manager to perform actions like adding, removing and replacing items.

  • Control the state of the UI by showing the options that the user has asked about.

  • Form narratives for items that the user has asked about.

  • Use LLM to generate a suitable response and return it to the Chat engine.

The ACE Agent plugin microservice takes inputs from the below microservices when required based on the user query, to formulate different kinds of metadata. The input sources of the plugin server are:

  1. ACE Agent microservices - Plugin module receives the query from ACE Agent.

  2. Catalog RAG microservice - Plugin module calls the APIs exposed by Catalog RAG microservice to fetch all the details associated with an item like sizes and price. These details are used to validate user requests as well as formulate required metadata for clarifications if needed. The plugin module also calls Catalog RAG’s /analyze_query API for query analysis and parameter extraction.

  3. Cart Manager microservice - Plugin module calls the API exposed by cart manager to get current state of user cart. Example: Add cheeseburger to my cart.

  4. User Interface Server - Plugin module calls the API exposed by UI server for servicing queries related to UI. Examples: What entrees do you have?.

On taking inputs from these microservices, the ACE Agent plugin module formulates below metadata which are sent out to ACE Agent microservices and Cart Manager microservices. The metadata includes:

  1. Information about different actions on a user cart like adding an item, removing an item, replacing an item etc. is sent out to APIs exposed by the Cart Manager.

  2. Information about different actions on User Interface when a user speaks out certain requests like showing all sides on screen, displaying the cart, going back to the previous page etc. to APIs exposed by the UI Server.

  3. Information to formulate the spoken responses of the Avatar to ACE Agent.


At a high level, the flow of a conversation goes like this:

  1. The Colang runtime runs as part of the chat engine to provide guard-railing and conversational interaction management capabilities. It forwards the query to /analyze_query API of the plugin module, which in-turn calls the similarly named API of the Catalog RAG to obtain query analysis - intent and parameter extraction.

  2. The plugin server implements the logic to fulfill the classified intent (eg: add an item to the cart) by performing the actions with the help of Cart Manager, Catalog RAG and the UI server, based on the intent and identified items.

  3. The result of plugin server action is passed to the Colang runtime

  4. The Colang runtime makes a call to the LLM to paraphrase the response in a human-friendly manner.

  5. The paraphrased response is then used by the Chat engine to respond to the user request.

You can read about how it interact with other Tokkio Retail microservices in Architecture.

See Retail Plugin Server APIs for more details.

Cart Manager#

The entire cart management for an active user is handled by the cart manager. It is a light weight service whose primary function is to update a user cart and communicate with the ACE Agent Plugin and UI regarding the updates. Main responsibilities of CM include:

  • Storing and retrieving user cart related data.

  • Communicate with the Catalog RAG service to get information regarding the items to be added to the cart.

  • The Cart Manager requires a healthy MongoDB instance to run. It uses MongoDB to store user cart information.

Catalog RAG#

Catalog RAG API provides a quick search and retrieval of queries based on the catalog (json) that is fed to the pipeline. It is designed to decouple the query logic from the backend solution. * Catalog RAG at launch uses the FAISS library for vectorstore creation and HuggingFace models for generating embeddings based on the default catalog (menu.json) * The Catalog RAG supports a wide range of searching and filtering capabilities on catalog items, such as REGEX pattern matching, logic chaining, ordering, etc. * It also allows users to batch export and import catalog content on the fly through API calls (modifications to the catalog will not be persisted between restart). * Additionally, it supports analysis of queries forwarded by the plugin server and provides a structured output for the query parameters and the intent of the query. This is utilized by the plugin server to determine the action to be taken on the user query.

UI server and UI Front End#

UI Server acts as an abstraction layer to decouple the presentation layer and all the backend AI microservices logics; hence, it provides the flexibility to integrate customized UI front-end seamlessly and minimize necessary code changes. The main features of the UI server include:

  • APIs for managing (adding/removing/viewing) cart items from the Cart API

  • APIs for providing catalog contents from the Catalog RAG

  • APIs for Ace Agent Plugin to manipulate UI rendering for speech input

  • APIs for UI Front End to communicate touch inputs for catalog navigation and food ordering

  • APIs for UI Front End to register information of the current view for ACE Agent Plugin (or any other MS) on demand gRPC communications with ACE Agent Chat Controller to receive ASR and forwarding to UI Front End for rendering

  • APIs for rendering custom views on the UI by providing JSON request payload

  • Redis events monitoring for FOV entry/exit, Camera add/remove, and error reporting

  • Multi-channel logging (file, console, etc.) with rotation

  • HTTP and HTTPS supported

UI front end presents the catalog items in a user-friendly manner. It enables the user to interact with the UI through touch and speech inputs. It communicates with the UI server through well-defined REST APIs and web socket connection. The front end UI features includes:

  • Showing categorized catalog items

  • Navigating through different categories

  • Adding-removing items to cart and display the cart

  • Stream the animated Tokkio avatar using WebRTC protocol

  • Display ASR messages using web socket connection

  • Support audio-only pipeline to use Tokkio app without camera

  • Render custom view components based on the provided JSON payload

  • Production build configuration and dockerized image

Tokkio Interaction Flow#

Touch Based Interactions#

When a user interacts with touch, the UI registers the request and acts on the user intent of browsing the catalog (display updates) or adding items to the cart (cart updates). For display updates, the UI server simply queries the items it needs to display and communicates with the UI client to update the display accordingly.

For interactions involving cart updates, the UI server communicates with the Cart Manager. The CM in turn retrieves the session related cart information from the datastore and updates the cart. The display is then updated accordingly. Note that a session identifier is used for all communications between the Retail components to ensure that the retrieved and the updated session information is relevant.

Speech Based Interactions#

When a user interacts with speech, the ACE Agent Plugin passes down the relevant request to the Cart Manager for any cart related updates or the UI server for any display related updates. In this case too, the CM retrieves the session information from the datastore and updates the cart. The display is then updated accordingly.

Session Identification#

A unique token associated with a user session is used for all communication between Retail apps. The sessionId is generated by the Chat Controller. It identifies the dialog session from a FOV entry trigger to dialog completion (FOV Exit or browser closure) to maintain a user context. Note that the connectionId passed down to the UI server from the ACE Agent plugin for starting as well as stopping the session is an id associated with the incoming video stream.

Retail Workflow Limitations#

The retail bot workflow for Tokkio is for reference only. It offers a potential to be enhanced in various ways, including support for new operations related to the retail domain. The implemented workflow has the following limitations, which are not intended to be exhaustive:

  • Adding multiple items in a single query is not supported. Items must be added with or without topping one at a time.

  • Item recommendations are not supported.

  • Users need to exit the FOV and re-enter to start a new session.

  • Users might need to re-phrase some queries if the intent is not recognized correctly.

  • Default size of an item (small) is used by the bot for add/remove items unless otherwise stated.

  • Contextual queries like “Add that”, “remove it”, etc., are not supported out of the box, but can be supported by passing the chat history in \analyze_query API

  • Item replacement is not supported

  • The Catalog is displayed after the user leaves the FOV, but the user cannot place the order until they enter (or re-enter the FOV)