Natural Language Processing

Significant advances in the NLP field have been made over the past few years with most of the advances sharing one common thread: dramatically larger models trained on more data. BERT-large, for example, has 340 million parameters and GPT-2 has 1.5 billion parameters. Models of this size make inference tasks on a CPU impractical today, necessitating a scalable inference framework for NLP tasks on a GPU.

Riva Natural Language Processing (NLP) is a flexible sequence classification and sequence labeling toolkit. It takes text as input and performs a number of analysis algorithms, such as named entity recognition, intent classification, punctuation, and translation. Riva NLP is built based on common text processing models that can be adapted for multiple common NLP tasks.

Riva NLP enables the fast deployment of new task-specific NLP models without requiring additional development time for deployment.

Model Architectures

Bidirectional Encoder Representations from Transformers (BERT)

By pretraining a model like BERT in an unsupervised fashion, NLP practitioners are able to create application-specific models by simply adding a different “head” (or output layer) to the model and fine-tune the augmented model with in-domain data for the desired task. Riva NLP enables deployment of models trained in this manner.

Riva NLP supports models which are BERT-based. Google’s BERT (Bidirectional Encoder Representations from Transformers) is, as the name implies, a transformer-based language model. After pre-trained, adding a single layer as necessary for the downstream task allows the model to be fine-tuned and achieve state-of-the-art results (at the time) across a wide variety of disparate NLP tasks. While new models have built on BERT’s success, its relative simplicity, parameter count, and good task-specific performance, make it a compelling choice for a latency-sensitive NLP deployment. Most fine-tuning tasks can run in a few hours on a single GPU. For more information about BERT, refer to the BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding paper.

Megatron

Megatron is a transformer model architecture inspired by BERT that is designed to scale up to billions of parameters. When training NLP models for deployment with Riva, you can select between standard BERT and Megatron. For more details on Megatron, refer to the Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism paper.

Services

The Riva Natural Language API exposes task-specific functions for popular NLP tasks, including question answering, entity recognition, intent recognition, and slot-filling as well as more generic NLP pipelines for custom use cases. The intent of this service is to allow you to design models for arbitrary use cases that conform simply with input and output types specified in the service. As an explicit example, the ClassifyText function could be used for sentiment classification, domain recognition, language identification, etc.

Pipeline Configuration

Riva NLP supports 5 different NLP tasks:

  1. Text Classification

  2. Token Classification (Named Entity Recognition)

  3. Joint Intent and Slots

  4. Question Answering (Extractive)

  5. Punctuation and Capitalization

Riva NLP enables you to deploy your own custom trained NLP models from TAO Toolkit via riva-build and riva-deploy. The exported model from TAO Toolkit comes as an .riva file which is essentially a tar file consisting of the checkpoints in either ONNX or .pt (PyTorch) format along with their necessary model configuration. This .riva file takes as input into the riva-build along with the pipeline task name to produce the deployment specific RMIR (Riva Model Intermediate Representation).

The three NLP classification tasks (i.e. token_classification, intent_slot, and text_classification) support an optional parameter called --domain_name that enables you to name your custom models. This is useful if you plan to deploy multiple models of the same task.

For the task of intent_slot, Riva also supports a parameter called --contextual that enables you to specify whether the model you are using is contextual or not. If --contextual is set to true, the Riva server prepends to the input query the previous intent if there is one or intent_none otherwise. Else, the Riva server prepends anything to the input query. By default, Riva will set this field’s value to true.

Each of the tasks support a set of command-line arguments that enables you to set your configuration settings via the CLI. Use the format riva-build <task name> -h to see the list of available CLI inputs for each task. If your .riva archives are encrypted, you need to include :<encryption_key> at the end of the RMIR filename and riva filename. Otherwise, this is unnecessary.

Text Classification

Text Classification is the task of classifying any given input text (or sequence of tokens) into a predefined set of classes. You can use this generic text classification task for various domain specific use cases such as sentiment classification, topic classification, intent classification, domain classification etc. To build the intermediate representation (RMIR) for text classification, run:

riva-build text_classification \
    --domain_name=”<your custom domain name>” \
    /servicemaker-dev/<rmir_filename>:<encryption_key> \
    /servicemaker-dev/<riva_filename>:<encryption_key>

For more information, refer to the Text Classification notebook, which is an end-to-end workflow for text classification. This workflow starts with training in TAO Toolkit and ends with deployment using Riva.

Token Classification (Named Entity Recognition)

Given a sequence of tokens, or a sentence, Token Classification is the task of classifying every input token into a corresponding output class. As an example, a token can be classified as a person, location, product, etc. To build the intermediate representation (RMIR) for token classification, run:

riva-build token_classification \
    --domain_name=”<your custom domain name>” \
    /servicemaker-dev/<rmir_filename>:<encryption_key> \
    /servicemaker-dev/<riva_filename>:<encryption_key>

For more information, refer to the Token Classification notebook; an end-to-end workflow for NER. This workflow starts with training in TAO Toolkit and ends with deployment using Riva.

Joint Intent and Slots

Joint Intent and Slot Classification is a task of classifying an intent and detecting all relevant slots (entities) for this intent in a query. For example, in the query: “What is the weather in Santa Clara tomorrow morning?”, we would like to classify the query as a “weather” intent, and detect “Santa Clara” as a location slot and “tomorrow morning” as a date_time slot. Intents and slot names are usually task specific and defined as labels in the training data. This is a common task in the building of domain specific task oriented chatbots.

riva-build intent_slot \
    --domain_name=”<your custom domain name>” \
    /servicemaker-dev/<rmir_filename>:<encryption_key> \
    /servicemaker-dev/<riva_filename>:<encryption_key>

For more information, refer to the Intent Slot Classification notebook; an end-to-end workflow for intent slot classification. This workflow starts with training in TAO Toolkit and ends with deployment using Riva.

Question Answering (Extractive)

Extractive Question Answering, also known as Reading Comprehension, is a task where the inputs are a passage and a question and the model needs to find the best answer span in the given passage, if it exists. The most common dataset for Extractive Question Answering is the SQuAD dataset, however, you can fine-tune your own set of Question Answering dataset and deploy on Riva. To build the intermediate representation (RMIR) for Question Answering, run:

riva-build qa \
    /servicemaker-dev/<rmir_filename>:<encryption_key> \
    /servicemaker-dev/<riva_filename>:<encryption_key>

For more information, refer to the Question Answering notebook; an end-to-end workflow for question answering. This workflow starts with training in TAO Toolkit and ends with deployment using Riva.

Punctuation and Capitalization

Given a text devoid of any punctuation, the task of Punctuation and Capitalization fills in the necessary punctuations at the appropriate places and capitalizes as necessary after the punctuations are filled in. As an example, given the text - “what are the best restaurants in san jose” will be converted to “What are the best restaurants in San Jose?”. This task is also used at the end of a Speech Recognition pipeline to augment its output with necessary punctuations.

riva-build punctuation \
    /servicemaker-dev/<rmir_filename>:<encryption_key> \
    /servicemaker-dev/<riva_filename>:<encryption_key>

For more information, refer to the Punctuation Capitalization notebook; an end-to-end workflow for punctuation and capitalization. This workflow starts with training in TAO Toolkit and ends with deployment using Riva.

Pretrained Models

Task

Architecture

Language

Dataset

Domain

Accuracy

Compatibility with TAO Toolkit 3.0-21.08

Compatibility with Nemo 1.0.0b4

Link

QA

BERT

English

SQuAD 2.0

EM: 71.24 F1: 74.32

Yes

Yes

RIVA / RMIR

QA

Megatron

English

SQuAD 2.0

TBM

Yes

Yes

RIVA / RMIR

Entity Recognition

BERT

English

GMB (Groningen Meaning Bank)

LOC, ORG, PER, GPE, TIME, MISC, O

Yes

Yes

RIVA / RMIR

Punctuation/Capitalization

BERT

English

Tatoeba sentences, Books from the Project Gutenberg that were used as part of the LibriSpeech corpus, Transcripts from Fisher English Training Speech

Yes

Yes

RIVA / RMIR

Intent Detection & Slot Tagging

BERT

English

Proprietary

Weather

Yes

Yes

RIVA / RMIR

Text Classification

BERT

English

Proprietary

Yes

Yes

RIVA / RMIR

The aforementioned NLP tasks leverage large scale pretrained models such as BERT-base, BERT-large, and Megatron-LM BERT. Here are pointers to the respective pretrained models hosted on NGC.

  1. BERT Base.

  2. BERT Large.

  3. Megatron BERT 345M.