Chatbot Overview

In recent times, chatbots have emerged as one of the most popular applications of Deep Learning. A lot of companies use chatbots to automate queries from users based on a knowledge base they have acquired over the years. It then makes sense for them to help a customer get quick answers to their questions from this knowledge base of articles rather than having them read pages of articles. This is where Question Answering chatbots come in. By feeding a large amount of text/domain knowledge to a chatbot, it is able to answer questions from the given text. Question Answering chatbots on a company’s website improves the user experience of a customer visiting the website. Let’s now understand how an organization can leverage AI to create their own Question Answering chatbots.

BERT, or Bidirectional Encoder Representations from Transformers, is a neural approach to pre-train language representations which obtains near state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks such as Question Answering, speech recognition etc.

A detailed description about BERT’s architecture is available on Google’s research paper for BERT. To train a BERT model for question answering we use Stanford Question Answering Dataset (SQuAD) dataset. It is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. Unlike SQuADv1.1, SQuADv2.0 can contain questions that are unanswerable.

NVIDIA GPU Cloud (NGC) provides state-of-the-art, optimized, machine learning frameworks, as well as models and scripts that can help you accelerate your AI journey and ensure that you’re always using the best tools for the job.

NGC provides BERT models which have been pre trained for hours on NVIDIA’s supercomputer so a user can download the model and fine-tune it on their own dataset without having to spend tremendous amounts of compute power on their side.

A deep learning model can’t consume text/sentences directly. The text first needs to be converted into a format that the model can understand. Tokenization is the process of breaking down the text into standard units that a model can understand. Traditional tokenization algorithms would split a sentence by a delimiter and assign each word a numerical value.

Example: “A quick fox jumps over a lazy dog” can be split into [“A”, “quick”, “fox”, “jumps”, “over”, “a”, “lazy”, “dog”] and can be assigned numerical values [1, 2,3, 4, 5, 6, 7, 8]. This vector can then be sent to a model as an input. The numeric values can be assigned by keeping a dictionary of all the words in the English language and giving each of them an ID. This dictionary is called a vocabulary in the NLP jargon.

Tokenizing words in this way (splitting by spaces) can pose the following issues: - A large vocabulary is needed as you will need to store all words in the dictionary. - Uncertainty of combined words like “check-in” (i.e., what exactly constitutes a word) is often ambiguous. - Certain languages don’t segment well by spaces.

A potential solution is to use subword tokenization, which helps break down unknown words into “subword units” so that models can make intelligent decisions on words that aren’t recognized. For example, words like check-in get further split into “check” and “in” or cycling into “cycle” and “ing” thereby reducing the number of words in the vocabulary.

A BERT model downloaded from NGC can be fine-tuned on SQUAD (Stanford’s Question Answering Dataset) to create a QA chatbot which can be hosted on Triton Inference Server. To feed a QA task into BERT, we pack both the question and the reference text into the input and tokenize them. The output is similarly tokenized. The model is then fine tuned with this input-output pair. This is how a Question Answering Model is trained.

© Copyright 2022-2023, NVIDIA. Last updated on Jan 10, 2023.