Technical Brief

Overview

Large language models (LLMs) and generative AI offer retailers a remarkable chance to engage customers across omnichannel platforms in a more natural and personalized manner.

This retail shopping advisor AI workflow shows how to develop an LLM-powered retrieval-augmented generation (RAG) application that can ingest product catalog data and use some of the latest generative AI features to provide a differentiated experience delivering contextually accurate, human-like answers to customers’ inquiries and recommendation requests – including product insights, cross-sell and upsell recommendations and more. This more natural way of interaction, search and discovery is like delivering your best sales associate for every customer engagement.

Online shopping experiences can often be unnecessarily frustrating. A shopping advisor needs to provide personalization and the ability to answer natural language, long-tail, complex questions. However, current tools typically only perform well for shorter, keyword based queries. This leads to scenarios where customers cannot find everything they are seeking or require assistance in ideating on the products they need, leading to cart abandonment and poor customer experience in a very competitive retail landscape. Whether it is curating all the necessary items for a home office or trying to create a soccer themed birthday party for your 8-year-old, the search process can often take multiple attempts and in some cases to no avail. This not only frustrates the consumer, but it is also a lost opportunity for the retailer to capture revenue and drive up-sell and cross-sell revenue.

This retail shopping advisor AI workflow gives enterprises a fast and advanced way to go from pilot to business value. It has everything you need to make a consumer shopping experience conversational, precise and accurate. The retail shopping advisor reference solution comes with a sample dataset of product data from NVIDIA’s Employee Gear Store that represents a product catalog. You can use this reference example to add your own product catalog and related data to create an interactive shopping advisor for your business. Included within this workflow is a jupyter lab notebook server with a sample notebook that shows the solution’s features, so you can quickly prototype and experiment with your own data.

Software Components

The RAG-based AI shopping advisor workflow is docker based and provides a reference to build an enterprise-ready generative AI solution with minimal effort. It contains the following software components:

NVIDIA NIM microservices
- Large Language Model - Llama3 70b
- NVIDIA NeMo Retriever Embedding Model (NV-Embed-QA-4)
LangChain
Vector Database: Milvus (GPU-optimized)

You will use a sample Jupyter notebook with the Jupyter Lab service to interact with the code directly.

NVIDIA NIM

NVIDIA accelerates inference on LLMs by implementing optimizations across the NVIDIA stack. An open model, such as Llama 3, is optimized and packaged as an NVIDIA NIM microservice with a standard application programming interface which allows developers to quickly innovate. The workflow allows you to set up in a way where the NVIDIA API catalog or use of a locally deployed NIM is a configuration setting.

When using the NVIDIA API catalog, the workflow leverages NVIDIA AI Foundation models, which are hosted by NVIDIA, therefore eliminating the need for your own GPU for inference to begin developing. NVIDIA’s API catalog allows you to interact with the latest NVIDIA AI Foundation Models through either a browser or model API endpoints.

The following models (API endpoints or locally deployed NIMs) are used within the workflow. As you develop, you could potentially experiment with changing the models to use other NVIDIA hosted models or provided NIMs.

Llama3 70b - Pretrained and instruction tuned generative text model optimized for dialog use cases with 70 billion parameters. Llama 3 is an auto-regressive language model that uses and optimizes transformer architecture on NVIDIA GPUs.
NV-Embed-QA - The NVIDIA NeMo Retriever QA embedding model is optimized for text question-answering retrieval. An embedding model is a crucial component of a text retrieval system, as it transforms textual information into dense vector representations. They are typically transformer encoders that process tokens of input text (for example, question, passage) to output an embedding.

Please refer to the official NVIDIA NIM documentation and the AI Chatbot with RAG Technical Brief for more information.

NVIDIA LangChain Endpoints API

This reference AI workflow shows how to use LangChain connectors to interact with and develop a retail shopping advisor using NVIDIA’s API endpoints.

Note

Please refer to LangChain’s documentation for additional information on how to deploy NVIDIA AI Foundation endpoints if interested.

Vector Database: Milvus

Milvus is a highly flexible, reliable, and blazing-fast cloud-native, open-source vector database. It powers embedding similarity search and AI applications and strives to make vector databases accessible to every organization. Milvus can store, index, and manage a billion+ embedding vectors generated by deep neural networks and other machine learning (ML) models.

Inference Pipeline

As part of the inferencing pipeline, we connect the LLM to the sample dataset: in this case the NVIDIA Employee Gear Store. This external knowledge can come in many forms, including product catalogs, finance spreadsheets, or employee documents. Enhancing the model’s capabilities with this knowledge is done using vector RAG.

To generate a product recommendation, the embedding model retrieves the most relevant products, and the LLM uses this to produce a response. However, there are some situations where only using retrieval augmented generation for a user’s query is not enough. For example, when a user asks the chatbot to add an item to their cart, the LLM needs to know the user’s intent and then take the right action. To do this, the LLM must understand information such as the intent, quantity, and item name and then call the correct internal API. This is called function calling. In this workflow, we demonstrate how an LLM can perform function calling on retail APIs using the Llama3-70B model on the NVIDIA API catalog.

The following graphic describes these processes:

For more information regarding document ingestion and retrieval as well as user query and response generation, please refer to the AI Chatbot with RAG Technical Brief which this retail shopping assistant workflow is based.

Additional Components

The following additional components are used as a part of the workflow solution:

Jupyter Lab Service

A sample notebook is provided which allows you to interact with the code built for the shopping adviser AI workflow. This notebook exposes the functionality of the workflow and is a great place to iterate over the solution once you bring your product catalog into the workflow.

Previous Introduction

Next Notices