Natural Language Processing

Significant advances in the NLP field have been made over the past few years with most of the advances sharing one common thread: dramatically larger models trained on more data. BERT-large, for example, has 340 million parameters and GPT-2 has 1.5 billion parameters. Models of this size make inference tasks on a CPU impractical today, necessitating a scalable inference framework for NLP tasks on a GPU.

Jarvis Natural Language Processing (NLP) is a flexible sequence classification and sequence labeling toolkit. It takes text as input and performs a number of analysis algorithms, such as named entity recognition, intent classification, punctuation, and translation. Jarvis NLP is built based on common text processing models that can be adapted for multiple common NLP tasks.

Jarvis NLP enables the fast deployment of new task-specific NLP models without requiring additional development time for deployment.

Model Architectures

Bidirectional Encoder Representations from Transformers (BERT)

By pretraining a model like BERT in an unsupervised fashion, NLP practitioners are able to create application-specific models by simply adding a different “head” (or output layer) to the model and fine-tune the augmented model with in-domain data for the desired task. Jarvis NLP enables deployment of models trained in this manner.

Jarvis NLP supports models which are BERT-based. Google’s BERT (Bidirectional Encoder Representations from Transformers) is, as the name implies, a transformer-based language model. After pre-trained, adding a single layer as necessary for the downstream task allows the model to be fine-tuned and achieve state-of-the-art results (at the time) across a wide variety of disparate NLP tasks. While new models have built on BERT’s success, its relative simplicity, parameter count, and good task-specific performance, make it a compelling choice for a latency-sensitive NLP deployment. Most fine-tuning tasks can run in a few hours on a single GPU. For more information about BERT, refer to the BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding paper.

Megatron

Megatron is a transformer model architecture inspired by BERT that is designed to scale up to billions of parameters. When training NLP models for deployment with Jarvis, you can select between standard BERT and Megatron. For more details on Megatron, refer to the Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism paper.

Services

Jarvis NLP Services expose two different APIs: a high-level API JarvisNLP and a low-level API JarvisCoreNLP.

High-level API: The high-level API (JarvisNLP) exposes task-specific functions for popular NLP tasks, including question answering, entity recognition, intent recognition and slot-filling.

Low-level API: The low-level API (JarvisCoreNLP) provides generic NLP services for custom use cases. The intent of this service is to allow users to design models for arbitrary use cases that conform simply with input and output types specified in the service. As an explicit example, the ClassifyText function could be used for sentiment classification, domain recognition, language identification, etc.

Pipeline Configuration

Jarvis NLP supports 5 different NLP tasks:

  1. Text Classification

  2. Token Classification (Named Entity Recognition)

  3. Joint Intent and Slots

  4. Question Answering (Extractive)

  5. Punctuation and Capitalization

Jarvis NLP enables you to deploy your own custom trained NLP models from Transfer Learning Toolkit (TLT) via jarvis-build and jarvis-deploy. The exported model from TLT comes as an .ejrvs file which is essentially a tar file consisting of the checkpoints in either in ONNX or .pt (PyTorch) format along with their necessary model configuration. This .ejrvs file is takes as input into the jarvis-build along with the pipeline task name to produce the deployment specific JMIR (Jarvis Model Intermediate Representation).

The three NLP classification tasks (i.e. token_classification, intent_slot, and text_classification) support an optional parameter called --domain_name that enables you to name your custom models. This would be useful if you plan to deploy multiple models of the same task.

Each of the tasks support a set of command-line arguments that enables you to set your configuration settings via the CLI. Use the format jarvis-build <task name> -h to see the list of available CLI inputs for each task. If your .ejrvs archives are encrypted you need to include :<encryption_key> at the end of the JMIR filename and ejrvs filename. Otherwise this is unnecessary.

Text Classification

Text Classification is the task of classifying any given input text (or sequence of tokens) into a predefined set of classes. You can use this generic text classification task for various domain specific use cases such as sentiment classification, topic classification, intent classification, domain classification etc. To build the intermediate representation (JMIR) for text classification, run:

jarvis-build text_classification \
    --domain_name=”<your custom domain name>” \
    /servicemaker-dev/<jmir_filename>:<encryption_key> \
    /servicemaker-dev/<ejrvs_filename>:<encryption_key>

For more information, refer to the Text Classification notebook; an end-to-end workflow for text classification. This workflow starts with training in TLT and ends with deployment using Jarvis.

Token Classification (Named Entity Recognition)

Given a sequence of tokens, or a sentence, Token Classification is the task of classifying every input token into a corresponding output class. As an example, a token can be classified as a person, location, product, etc. To build the intermediate representation (JMIR) for token classification, run:

jarvis-build token_classification \
    --domain_name=”<your custom domain name>” \
    /servicemaker-dev/<jmir_filename>:<encryption_key> \
    /servicemaker-dev/<ejrvs_filename>:<encryption_key>

For more information, refer to the Named Entity Recognition (NER) notebook; an end-to-end workflow for NER. This workflow starts with training in TLT and ends with deployment using Jarvis.

Joint Intent and Slots

Joint Intent and Slot Classification is a task of classifying an intent and detecting all relevant slots (entities) for this intent in a query. For example, in the query: “What is the weather in Santa Clara tomorrow morning?”, we would like to classify the query as a “weather” intent, and detect “Santa Clara” as a location slot and “tomorrow morning” as a date_time slot. Intents and slots names are usually task specific and defined as labels in the training data. This is a common task in the building of domain specific task oriented chatbots.

jarvis-build intent_slot \
    --domain_name=”<your custom domain name>” \
    /servicemaker-dev/<jmir_filename>:<encryption_key> \
    /servicemaker-dev/<ejrvs_filename>:<encryption_key>

For more information, refer to the Intent Slot Classification notebook; an end-to-end workflow for intent slot classification. This workflow starts with training in TLT and ends with deployment using Jarvis.

Question Answering (Extractive)

Extractive Question Answering also known as Reading Comprehension is the task where the inputs are a passage and a question and the model needs to find the best answer span in the given passage if it exists. The most common dataset for Extractive Question Answering is the SQuAD dataset, however you can fine-tune your own set of QA dataset and deploy on Jarvis. To build the intermediate representation (JMIR) for Question Answering, run:

jarvis-build qa \
    /servicemaker-dev/<jmir_filename>:<encryption_key> \
    /servicemaker-dev/<ejrvs_filename>:<encryption_key>

For more information, refer to the Question Answering notebook; an end-to-end workflow for question answering. This workflow starts with training in TLT and ends with deployment using Jarvis.

Punctuation and Capitalization

Given a text devoid of any punctuation, the task of Punctuation and Capitalization will fill in the necessary punctuations at the appropriate places and capitalize as necessary after the punctuations are filled in. As an example, given the text - “what are the best restaurants in san jose” will be converted to “What are the best restaurants in San Jose?”. This task is also used at the end of a Speech Recognition pipeline to augment its output with necessary punctuations.

jarvis-build punctuation \
    /servicemaker-dev/<jmir_filename>:<encryption_key> \
    /servicemaker-dev/<ejrvs_filename>:<encryption_key>

For more information, refer to the Punctuation Capitalization notebook; an end-to-end workflow for punctuation and capitalization. This workflow starts with training in TLT and ends with deployment using Jarvis.

Pretrained Models

Task

Architecture

Language

Dataset

Domain

Accuracy

Compatibility with TLT 3.0

Compatibility with Nemo 1.0.0b4

Link

QA

BERT

English

SQuAD 2.0

EM: 71.24 F1: 74.32

Yes

Yes

EJRVS / JMIR

QA

Megatron

English

SQuAD 2.0

TBM

Yes

Yes

JMIR

Entity Recognition

BERT

English

GMB (Groningen Meaning Bank)

LOC, ORG, PER, GPE, TIME, MISC, O

Yes

Yes

EJRVS / JMIR

Punctuation/Capitalization

BERT

English

Tatoeba sentences, Books from the Project Gutenberg that were used as part of the LibriSpeech corpus, Transcripts from Fisher English Training Speech

Yes

Yes

EJRVS / JMIR

Intent Detection & Slot Tagging

BERT

English

Proprietary

Weather

Yes

Yes

EJRVS / JMIR

Text Classification

BERT

English

Proprietary

Yes

Yes

EJRVS / JMIR

The aforementioned NLP tasks leverage large scale pretrained models such as BERT-base, BERT-large, and Megatron-LM BERT. Here are pointers to the respective pretrained models hosted on NGC.

  1. BERT Base.

  2. BERT Large.

  3. Megatron BERT 345M.