Natural Language Processing¶
Significant advances in the NLP field have been made over the past few years with most of the advances sharing one common thread: dramatically larger models trained on more data. BERT-large, for example, has 340 million parameters and GPT-2 has 1.5 billion parameters. Models of this size make inference tasks on a CPU impractical today, necessitating a scalable inference framework for NLP tasks on a GPU.
Riva Natural Language Processing (NLP) is a flexible sequence classification and sequence labeling toolkit. It takes text as input and performs a number of analysis algorithms, such as named entity recognition, intent classification, punctuation, and translation. Riva NLP is built based on common text processing models that can be adapted for multiple common NLP tasks.
Riva NLP enables the fast deployment of new task-specific NLP models without requiring additional development time for deployment.
Model Architectures¶
Bidirectional Encoder Representations from Transformers (BERT)¶
By pretraining a model like BERT in an unsupervised fashion, NLP practitioners are able to create application-specific models by simply adding a different “head” (or output layer) to the model and fine-tune the augmented model with in-domain data for the desired task. Riva NLP enables deployment of models trained in this manner.
Riva NLP supports models which are BERT-based. Google’s BERT (Bidirectional Encoder Representations from Transformers) is, as the name implies, a transformer-based language model. After pre-trained, adding a single layer as necessary for the downstream task allows the model to be fine-tuned and achieve state-of-the-art results (at the time) across a wide variety of disparate NLP tasks. While new models have built on BERT’s success, its relative simplicity, parameter count, and good task-specific performance, make it a compelling choice for a latency-sensitive NLP deployment. Most fine-tuning tasks can run in a few hours on a single GPU. For more information about BERT, refer to the BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding paper.
Megatron¶
Megatron is a transformer model architecture inspired by BERT that is designed to scale up to billions of parameters. When training NLP models for deployment with Riva, you can select between standard BERT and Megatron. For more details on Megatron, refer to the Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism paper.
Services¶
The Riva Natural Language API exposes task-specific functions for popular NLP tasks, including question
answering, entity recognition, intent recognition, and slot-filling as well as more generic NLP pipelines for custom use cases. The intent of this
service is to allow you to design models for arbitrary use cases that conform simply with input and output types specified in the
service. As an explicit example, the ClassifyText
function could be used for sentiment classification, domain recognition,
language identification, etc.
Pipeline Configuration¶
Riva NLP supports 5 different NLP tasks:
Text Classification
Token Classification (Named Entity Recognition)
Joint Intent and Slots
Question Answering (Extractive)
Punctuation and Capitalization
Riva NLP enables you to deploy your own custom trained NLP models from TAO Toolkit via riva-build
and
riva-deploy
. The exported model from TAO Toolkit comes as an .riva
file which is essentially a tar file consisting of the checkpoints in either
ONNX
or .pt
(PyTorch) format along with their necessary model configuration. This .riva
file takes as input into the
riva-build
along with the pipeline task name to produce the deployment specific RMIR (Riva Model Intermediate Representation).
The three NLP classification tasks (i.e. token_classification
, intent_slot
, and text_classification
) support an optional parameter
called --domain_name
that enables you to name your custom models. This is useful if you plan to deploy multiple models of the same task.
For the task of intent_slot
, Riva also supports a parameter called --contextual
that enables you to specify whether the model you are
using is contextual or not. If --contextual
is set to true
, the Riva server prepends to the input query the previous intent
if there is one or intent_none
otherwise. Else, the Riva server prepends anything to the input query. By default, Riva will
set this field’s value to true
.
Each of the tasks support a set of command-line arguments that enables you to set your configuration settings via the CLI. Use the
format riva-build <task name> -h
to see the list of available CLI inputs for each task. If your .riva
archives are
encrypted, you need to include :<encryption_key>
at the end of the RMIR filename and riva
filename. Otherwise, this is
unnecessary.
Text Classification¶
Text Classification is the task of classifying any given input text (or sequence of tokens) into a predefined set of classes. You can use this generic text classification task for various domain specific use cases such as sentiment classification, topic classification, intent classification, domain classification etc. To build the intermediate representation (RMIR) for text classification, run:
riva-build text_classification \
--domain_name=”<your custom domain name>” \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<riva_filename>:<encryption_key>
For more information, refer to the Text Classification notebook, which is an end-to-end workflow for text classification. This workflow starts with training in TAO Toolkit and ends with deployment using Riva.
Token Classification (Named Entity Recognition)¶
Given a sequence of tokens, or a sentence, Token Classification is the task of classifying every input token into a corresponding output class. As an example, a token can be classified as a person, location, product, etc. To build the intermediate representation (RMIR) for token classification, run:
riva-build token_classification \
--domain_name=”<your custom domain name>” \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<riva_filename>:<encryption_key>
For more information, refer to the Token Classification notebook; an end-to-end workflow for NER. This workflow starts with training in TAO Toolkit and ends with deployment using Riva.
Joint Intent and Slots¶
Joint Intent and Slot Classification is a task of classifying an intent and detecting all relevant slots (entities) for this intent in
a query. For example, in the query: “What is the weather in Santa Clara tomorrow morning?”, we would like to classify the query as a
“weather” intent, and detect “Santa Clara” as a location slot and “tomorrow morning” as a date_time
slot. Intents and slot names
are usually task specific and defined as labels in the training data. This is a common task in the building of domain specific task
oriented chatbots.
riva-build intent_slot \
--domain_name=”<your custom domain name>” \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<riva_filename>:<encryption_key>
For more information, refer to the Intent Slot Classification notebook; an end-to-end workflow for intent slot classification. This workflow starts with training in TAO Toolkit and ends with deployment using Riva.
Question Answering (Extractive)¶
Extractive Question Answering, also known as Reading Comprehension, is a task where the inputs are a passage and a question and the model needs to find the best answer span in the given passage, if it exists. The most common dataset for Extractive Question Answering is the SQuAD dataset, however, you can fine-tune your own set of Question Answering dataset and deploy on Riva. To build the intermediate representation (RMIR) for Question Answering, run:
riva-build qa \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<riva_filename>:<encryption_key>
For more information, refer to the Question Answering notebook; an end-to-end workflow for question answering. This workflow starts with training in TAO Toolkit and ends with deployment using Riva.
Punctuation and Capitalization¶
Given a text devoid of any punctuation, the task of Punctuation and Capitalization fills in the necessary punctuations at the appropriate places and capitalizes as necessary after the punctuations are filled in. As an example, given the text - “what are the best restaurants in san jose” will be converted to “What are the best restaurants in San Jose?”. This task is also used at the end of a Speech Recognition pipeline to augment its output with necessary punctuations.
riva-build punctuation \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<riva_filename>:<encryption_key>
For more information, refer to the Punctuation Capitalization notebook; an end-to-end workflow for punctuation and capitalization. This workflow starts with training in TAO Toolkit and ends with deployment using Riva.
Riva-build Optional Parameters¶
For details about the parameters passed to riva-build
to customize the NLP pipelines, issue:
riva-build <nlp_pipeline> -h
where <nlp_pipeline>
is one of text_classification
, token_classification
, intent_slot
, qa
or punctuation
. The following list includes descriptions for all optional parameters currently recognized by riva-build
:
usage: riva-build text_classification [-h] [-f]
[--language_code LANGUAGE_CODE]
[--max_batch_size MAX_BATCH_SIZE]
[--model_api MODEL_API]
[--domain_name DOMAIN_NAME]
[--start_token START_TOKEN]
[--end_token END_TOKEN] [--tokenizer]
[--vocab VOCAB]
[--tokenizer.max_sequence_idle_microseconds TOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--tokenizer.max_batch_size TOKENIZER.MAX_BATCH_SIZE]
[--tokenizer.min_batch_size TOKENIZER.MIN_BATCH_SIZE]
[--tokenizer.opt_batch_size TOKENIZER.OPT_BATCH_SIZE]
[--tokenizer.preferred_batch_size TOKENIZER.PREFERRED_BATCH_SIZE]
[--tokenizer.batching_type TOKENIZER.BATCHING_TYPE]
[--tokenizer.preserve_ordering TOKENIZER.PRESERVE_ORDERING]
[--tokenizer.instance_group_count TOKENIZER.INSTANCE_GROUP_COUNT]
[--tokenizer.max_queue_delay_microseconds TOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS]
[--tokenizer.unk_token TOKENIZER.UNK_TOKEN]
[--tokenizer.to_lower TOKENIZER.TO_LOWER]
[--tokenizer.bos_token TOKENIZER.BOS_TOKEN]
[--tokenizer.eos_token TOKENIZER.EOS_TOKEN]
[--language_model.max_sequence_idle_microseconds LANGUAGE_MODEL.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--language_model.max_batch_size LANGUAGE_MODEL.MAX_BATCH_SIZE]
[--language_model.min_batch_size LANGUAGE_MODEL.MIN_BATCH_SIZE]
[--language_model.opt_batch_size LANGUAGE_MODEL.OPT_BATCH_SIZE]
[--language_model.preferred_batch_size LANGUAGE_MODEL.PREFERRED_BATCH_SIZE]
[--language_model.batching_type LANGUAGE_MODEL.BATCHING_TYPE]
[--language_model.preserve_ordering LANGUAGE_MODEL.PRESERVE_ORDERING]
[--language_model.instance_group_count LANGUAGE_MODEL.INSTANCE_GROUP_COUNT]
[--language_model.max_queue_delay_microseconds LANGUAGE_MODEL.MAX_QUEUE_DELAY_MICROSECONDS]
[--language_model.trt_max_workspace_size LANGUAGE_MODEL.TRT_MAX_WORKSPACE_SIZE]
output_path source_path
[source_path ...]
Generate a Riva Model from a text_classification model trained with NVIDIA
NeMo.
positional arguments:
output_path Location to write compiled Riva pipeline
source_path Source file(s)
optional arguments:
-h, --help show this help message and exit
-f, --force Overwrite existing artifacts if they exist
--language_code LANGUAGE_CODE
Language of the model
--max_batch_size MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--model_api MODEL_API
--domain_name DOMAIN_NAME
Add the domain name for your custom Text Classifcation
model. This will be used to identify the model during
inference
--start_token START_TOKEN
--end_token END_TOKEN
--tokenizer Tokenizer type
--vocab VOCAB Path for the Language Model vocab file. If not
specified, default will be selected which is BERT base
vocab
tokenizer:
--tokenizer.max_sequence_idle_microseconds TOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--tokenizer.max_batch_size TOKENIZER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--tokenizer.min_batch_size TOKENIZER.MIN_BATCH_SIZE
--tokenizer.opt_batch_size TOKENIZER.OPT_BATCH_SIZE
--tokenizer.preferred_batch_size TOKENIZER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--tokenizer.batching_type TOKENIZER.BATCHING_TYPE
--tokenizer.preserve_ordering TOKENIZER.PRESERVE_ORDERING
Preserve ordering
--tokenizer.instance_group_count TOKENIZER.INSTANCE_GROUP_COUNT
How many instances in a group
--tokenizer.max_queue_delay_microseconds TOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--tokenizer.unk_token TOKENIZER.UNK_TOKEN
--tokenizer.to_lower TOKENIZER.TO_LOWER
--tokenizer.bos_token TOKENIZER.BOS_TOKEN
--tokenizer.eos_token TOKENIZER.EOS_TOKEN
language_model:
--language_model.max_sequence_idle_microseconds LANGUAGE_MODEL.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--language_model.max_batch_size LANGUAGE_MODEL.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--language_model.min_batch_size LANGUAGE_MODEL.MIN_BATCH_SIZE
--language_model.opt_batch_size LANGUAGE_MODEL.OPT_BATCH_SIZE
--language_model.preferred_batch_size LANGUAGE_MODEL.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--language_model.batching_type LANGUAGE_MODEL.BATCHING_TYPE
--language_model.preserve_ordering LANGUAGE_MODEL.PRESERVE_ORDERING
Preserve ordering
--language_model.instance_group_count LANGUAGE_MODEL.INSTANCE_GROUP_COUNT
How many instances in a group
--language_model.max_queue_delay_microseconds LANGUAGE_MODEL.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--language_model.trt_max_workspace_size LANGUAGE_MODEL.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
usage: riva-build token_classification [-h] [-f]
[--language_code LANGUAGE_CODE]
[--max_batch_size MAX_BATCH_SIZE]
[--model_api MODEL_API]
[--domain_name DOMAIN_NAME]
[--max_seq_length MAX_SEQ_LENGTH]
[--IOB IOB]
[--ignore_class IGNORE_CLASS]
[--start_token START_TOKEN]
[--end_token END_TOKEN] [--tokenizer]
[--vocab VOCAB] [--to_lower TO_LOWER]
[--tokenizer.max_sequence_idle_microseconds TOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--tokenizer.max_batch_size TOKENIZER.MAX_BATCH_SIZE]
[--tokenizer.min_batch_size TOKENIZER.MIN_BATCH_SIZE]
[--tokenizer.opt_batch_size TOKENIZER.OPT_BATCH_SIZE]
[--tokenizer.preferred_batch_size TOKENIZER.PREFERRED_BATCH_SIZE]
[--tokenizer.batching_type TOKENIZER.BATCHING_TYPE]
[--tokenizer.preserve_ordering TOKENIZER.PRESERVE_ORDERING]
[--tokenizer.instance_group_count TOKENIZER.INSTANCE_GROUP_COUNT]
[--tokenizer.max_queue_delay_microseconds TOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS]
[--tokenizer.unk_token TOKENIZER.UNK_TOKEN]
[--tokenizer.to_lower TOKENIZER.TO_LOWER]
[--tokenizer.bos_token TOKENIZER.BOS_TOKEN]
[--tokenizer.eos_token TOKENIZER.EOS_TOKEN]
[--language_model.max_sequence_idle_microseconds LANGUAGE_MODEL.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--language_model.max_batch_size LANGUAGE_MODEL.MAX_BATCH_SIZE]
[--language_model.min_batch_size LANGUAGE_MODEL.MIN_BATCH_SIZE]
[--language_model.opt_batch_size LANGUAGE_MODEL.OPT_BATCH_SIZE]
[--language_model.preferred_batch_size LANGUAGE_MODEL.PREFERRED_BATCH_SIZE]
[--language_model.batching_type LANGUAGE_MODEL.BATCHING_TYPE]
[--language_model.preserve_ordering LANGUAGE_MODEL.PRESERVE_ORDERING]
[--language_model.instance_group_count LANGUAGE_MODEL.INSTANCE_GROUP_COUNT]
[--language_model.max_queue_delay_microseconds LANGUAGE_MODEL.MAX_QUEUE_DELAY_MICROSECONDS]
[--language_model.trt_max_workspace_size LANGUAGE_MODEL.TRT_MAX_WORKSPACE_SIZE]
[--label_tokens.max_sequence_idle_microseconds LABEL_TOKENS.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--label_tokens.max_batch_size LABEL_TOKENS.MAX_BATCH_SIZE]
[--label_tokens.min_batch_size LABEL_TOKENS.MIN_BATCH_SIZE]
[--label_tokens.opt_batch_size LABEL_TOKENS.OPT_BATCH_SIZE]
[--label_tokens.preferred_batch_size LABEL_TOKENS.PREFERRED_BATCH_SIZE]
[--label_tokens.batching_type LABEL_TOKENS.BATCHING_TYPE]
[--label_tokens.preserve_ordering LABEL_TOKENS.PRESERVE_ORDERING]
[--label_tokens.instance_group_count LABEL_TOKENS.INSTANCE_GROUP_COUNT]
[--label_tokens.max_queue_delay_microseconds LABEL_TOKENS.MAX_QUEUE_DELAY_MICROSECONDS]
[--label_tokens.classes LABEL_TOKENS.CLASSES]
[--detokenizer.max_sequence_idle_microseconds DETOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--detokenizer.max_batch_size DETOKENIZER.MAX_BATCH_SIZE]
[--detokenizer.min_batch_size DETOKENIZER.MIN_BATCH_SIZE]
[--detokenizer.opt_batch_size DETOKENIZER.OPT_BATCH_SIZE]
[--detokenizer.preferred_batch_size DETOKENIZER.PREFERRED_BATCH_SIZE]
[--detokenizer.batching_type DETOKENIZER.BATCHING_TYPE]
[--detokenizer.preserve_ordering DETOKENIZER.PRESERVE_ORDERING]
[--detokenizer.instance_group_count DETOKENIZER.INSTANCE_GROUP_COUNT]
[--detokenizer.max_queue_delay_microseconds DETOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS]
output_path source_path
[source_path ...]
Generate a Riva Model from a token_classification model trained with NVIDIA
NeMo.
positional arguments:
output_path Location to write compiled Riva pipeline
source_path Source file(s)
optional arguments:
-h, --help show this help message and exit
-f, --force Overwrite existing artifacts if they exist
--language_code LANGUAGE_CODE
Language of the model
--max_batch_size MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--model_api MODEL_API
--domain_name DOMAIN_NAME
Add the domain name for your custom Entity
classification model. This will be used to identify
the model during inference
--max_seq_length MAX_SEQ_LENGTH
Max no. of tokens for an input query
--IOB IOB
--ignore_class IGNORE_CLASS
--start_token START_TOKEN
--end_token END_TOKEN
--tokenizer Tokenizer type
--vocab VOCAB Path for the Language Model vocab file. If not
specified, default will be selected which is BERT base
vocab
--to_lower TO_LOWER
tokenizer:
--tokenizer.max_sequence_idle_microseconds TOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--tokenizer.max_batch_size TOKENIZER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--tokenizer.min_batch_size TOKENIZER.MIN_BATCH_SIZE
--tokenizer.opt_batch_size TOKENIZER.OPT_BATCH_SIZE
--tokenizer.preferred_batch_size TOKENIZER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--tokenizer.batching_type TOKENIZER.BATCHING_TYPE
--tokenizer.preserve_ordering TOKENIZER.PRESERVE_ORDERING
Preserve ordering
--tokenizer.instance_group_count TOKENIZER.INSTANCE_GROUP_COUNT
How many instances in a group
--tokenizer.max_queue_delay_microseconds TOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--tokenizer.unk_token TOKENIZER.UNK_TOKEN
--tokenizer.to_lower TOKENIZER.TO_LOWER
--tokenizer.bos_token TOKENIZER.BOS_TOKEN
--tokenizer.eos_token TOKENIZER.EOS_TOKEN
language_model:
--language_model.max_sequence_idle_microseconds LANGUAGE_MODEL.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--language_model.max_batch_size LANGUAGE_MODEL.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--language_model.min_batch_size LANGUAGE_MODEL.MIN_BATCH_SIZE
--language_model.opt_batch_size LANGUAGE_MODEL.OPT_BATCH_SIZE
--language_model.preferred_batch_size LANGUAGE_MODEL.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--language_model.batching_type LANGUAGE_MODEL.BATCHING_TYPE
--language_model.preserve_ordering LANGUAGE_MODEL.PRESERVE_ORDERING
Preserve ordering
--language_model.instance_group_count LANGUAGE_MODEL.INSTANCE_GROUP_COUNT
How many instances in a group
--language_model.max_queue_delay_microseconds LANGUAGE_MODEL.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--language_model.trt_max_workspace_size LANGUAGE_MODEL.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
label_tokens:
--label_tokens.max_sequence_idle_microseconds LABEL_TOKENS.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--label_tokens.max_batch_size LABEL_TOKENS.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--label_tokens.min_batch_size LABEL_TOKENS.MIN_BATCH_SIZE
--label_tokens.opt_batch_size LABEL_TOKENS.OPT_BATCH_SIZE
--label_tokens.preferred_batch_size LABEL_TOKENS.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--label_tokens.batching_type LABEL_TOKENS.BATCHING_TYPE
--label_tokens.preserve_ordering LABEL_TOKENS.PRESERVE_ORDERING
Preserve ordering
--label_tokens.instance_group_count LABEL_TOKENS.INSTANCE_GROUP_COUNT
How many instances in a group
--label_tokens.max_queue_delay_microseconds LABEL_TOKENS.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--label_tokens.classes LABEL_TOKENS.CLASSES
detokenizer:
--detokenizer.max_sequence_idle_microseconds DETOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--detokenizer.max_batch_size DETOKENIZER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--detokenizer.min_batch_size DETOKENIZER.MIN_BATCH_SIZE
--detokenizer.opt_batch_size DETOKENIZER.OPT_BATCH_SIZE
--detokenizer.preferred_batch_size DETOKENIZER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--detokenizer.batching_type DETOKENIZER.BATCHING_TYPE
--detokenizer.preserve_ordering DETOKENIZER.PRESERVE_ORDERING
Preserve ordering
--detokenizer.instance_group_count DETOKENIZER.INSTANCE_GROUP_COUNT
How many instances in a group
--detokenizer.max_queue_delay_microseconds DETOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
usage: riva-build intent_slot [-h] [-f] [--language_code LANGUAGE_CODE]
[--max_batch_size MAX_BATCH_SIZE]
[--domain_name DOMAIN_NAME]
[--model_api MODEL_API] [--IOB IOB]
[--ignore_class IGNORE_CLASS]
[--start_token START_TOKEN]
[--end_token END_TOKEN]
[--contextual CONTEXTUAL] [--tokenizer]
[--vocab VOCAB]
[--max_seq_length MAX_SEQ_LENGTH]
[--tokenizer.max_sequence_idle_microseconds TOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--tokenizer.max_batch_size TOKENIZER.MAX_BATCH_SIZE]
[--tokenizer.min_batch_size TOKENIZER.MIN_BATCH_SIZE]
[--tokenizer.opt_batch_size TOKENIZER.OPT_BATCH_SIZE]
[--tokenizer.preferred_batch_size TOKENIZER.PREFERRED_BATCH_SIZE]
[--tokenizer.batching_type TOKENIZER.BATCHING_TYPE]
[--tokenizer.preserve_ordering TOKENIZER.PRESERVE_ORDERING]
[--tokenizer.instance_group_count TOKENIZER.INSTANCE_GROUP_COUNT]
[--tokenizer.max_queue_delay_microseconds TOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS]
[--tokenizer.unk_token TOKENIZER.UNK_TOKEN]
[--tokenizer.to_lower TOKENIZER.TO_LOWER]
[--tokenizer.bos_token TOKENIZER.BOS_TOKEN]
[--tokenizer.eos_token TOKENIZER.EOS_TOKEN]
[--language_model.max_sequence_idle_microseconds LANGUAGE_MODEL.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--language_model.max_batch_size LANGUAGE_MODEL.MAX_BATCH_SIZE]
[--language_model.min_batch_size LANGUAGE_MODEL.MIN_BATCH_SIZE]
[--language_model.opt_batch_size LANGUAGE_MODEL.OPT_BATCH_SIZE]
[--language_model.preferred_batch_size LANGUAGE_MODEL.PREFERRED_BATCH_SIZE]
[--language_model.batching_type LANGUAGE_MODEL.BATCHING_TYPE]
[--language_model.preserve_ordering LANGUAGE_MODEL.PRESERVE_ORDERING]
[--language_model.instance_group_count LANGUAGE_MODEL.INSTANCE_GROUP_COUNT]
[--language_model.max_queue_delay_microseconds LANGUAGE_MODEL.MAX_QUEUE_DELAY_MICROSECONDS]
[--language_model.trt_max_workspace_size LANGUAGE_MODEL.TRT_MAX_WORKSPACE_SIZE]
[--label_tokens.max_sequence_idle_microseconds LABEL_TOKENS.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--label_tokens.max_batch_size LABEL_TOKENS.MAX_BATCH_SIZE]
[--label_tokens.min_batch_size LABEL_TOKENS.MIN_BATCH_SIZE]
[--label_tokens.opt_batch_size LABEL_TOKENS.OPT_BATCH_SIZE]
[--label_tokens.preferred_batch_size LABEL_TOKENS.PREFERRED_BATCH_SIZE]
[--label_tokens.batching_type LABEL_TOKENS.BATCHING_TYPE]
[--label_tokens.preserve_ordering LABEL_TOKENS.PRESERVE_ORDERING]
[--label_tokens.instance_group_count LABEL_TOKENS.INSTANCE_GROUP_COUNT]
[--label_tokens.max_queue_delay_microseconds LABEL_TOKENS.MAX_QUEUE_DELAY_MICROSECONDS]
[--label_tokens.classes LABEL_TOKENS.CLASSES]
[--detokenizer.max_sequence_idle_microseconds DETOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--detokenizer.max_batch_size DETOKENIZER.MAX_BATCH_SIZE]
[--detokenizer.min_batch_size DETOKENIZER.MIN_BATCH_SIZE]
[--detokenizer.opt_batch_size DETOKENIZER.OPT_BATCH_SIZE]
[--detokenizer.preferred_batch_size DETOKENIZER.PREFERRED_BATCH_SIZE]
[--detokenizer.batching_type DETOKENIZER.BATCHING_TYPE]
[--detokenizer.preserve_ordering DETOKENIZER.PRESERVE_ORDERING]
[--detokenizer.instance_group_count DETOKENIZER.INSTANCE_GROUP_COUNT]
[--detokenizer.max_queue_delay_microseconds DETOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS]
output_path source_path [source_path ...]
Generate a Riva Model from a intent_slot model trained with NVIDIA NeMo.
positional arguments:
output_path Location to write compiled Riva pipeline
source_path Source file(s)
optional arguments:
-h, --help show this help message and exit
-f, --force Overwrite existing artifacts if they exist
--language_code LANGUAGE_CODE
Language of the model
--max_batch_size MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--domain_name DOMAIN_NAME
Add the domain name for your custom Intent Slot model.
This will be used to identify the model during
inference
--model_api MODEL_API
--IOB IOB
--ignore_class IGNORE_CLASS
--start_token START_TOKEN
--end_token END_TOKEN
--contextual CONTEXTUAL
--tokenizer Tokenizer type
--vocab VOCAB Path for the Language Model vocab file. If not
specified, default will be selected which is BERT base
vocab
--max_seq_length MAX_SEQ_LENGTH
Default maximum sequence length used during finetuning
BERT model
tokenizer:
--tokenizer.max_sequence_idle_microseconds TOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--tokenizer.max_batch_size TOKENIZER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--tokenizer.min_batch_size TOKENIZER.MIN_BATCH_SIZE
--tokenizer.opt_batch_size TOKENIZER.OPT_BATCH_SIZE
--tokenizer.preferred_batch_size TOKENIZER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--tokenizer.batching_type TOKENIZER.BATCHING_TYPE
--tokenizer.preserve_ordering TOKENIZER.PRESERVE_ORDERING
Preserve ordering
--tokenizer.instance_group_count TOKENIZER.INSTANCE_GROUP_COUNT
How many instances in a group
--tokenizer.max_queue_delay_microseconds TOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--tokenizer.unk_token TOKENIZER.UNK_TOKEN
--tokenizer.to_lower TOKENIZER.TO_LOWER
--tokenizer.bos_token TOKENIZER.BOS_TOKEN
--tokenizer.eos_token TOKENIZER.EOS_TOKEN
language_model:
--language_model.max_sequence_idle_microseconds LANGUAGE_MODEL.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--language_model.max_batch_size LANGUAGE_MODEL.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--language_model.min_batch_size LANGUAGE_MODEL.MIN_BATCH_SIZE
--language_model.opt_batch_size LANGUAGE_MODEL.OPT_BATCH_SIZE
--language_model.preferred_batch_size LANGUAGE_MODEL.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--language_model.batching_type LANGUAGE_MODEL.BATCHING_TYPE
--language_model.preserve_ordering LANGUAGE_MODEL.PRESERVE_ORDERING
Preserve ordering
--language_model.instance_group_count LANGUAGE_MODEL.INSTANCE_GROUP_COUNT
How many instances in a group
--language_model.max_queue_delay_microseconds LANGUAGE_MODEL.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--language_model.trt_max_workspace_size LANGUAGE_MODEL.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
label_tokens:
--label_tokens.max_sequence_idle_microseconds LABEL_TOKENS.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--label_tokens.max_batch_size LABEL_TOKENS.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--label_tokens.min_batch_size LABEL_TOKENS.MIN_BATCH_SIZE
--label_tokens.opt_batch_size LABEL_TOKENS.OPT_BATCH_SIZE
--label_tokens.preferred_batch_size LABEL_TOKENS.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--label_tokens.batching_type LABEL_TOKENS.BATCHING_TYPE
--label_tokens.preserve_ordering LABEL_TOKENS.PRESERVE_ORDERING
Preserve ordering
--label_tokens.instance_group_count LABEL_TOKENS.INSTANCE_GROUP_COUNT
How many instances in a group
--label_tokens.max_queue_delay_microseconds LABEL_TOKENS.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--label_tokens.classes LABEL_TOKENS.CLASSES
detokenizer:
--detokenizer.max_sequence_idle_microseconds DETOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--detokenizer.max_batch_size DETOKENIZER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--detokenizer.min_batch_size DETOKENIZER.MIN_BATCH_SIZE
--detokenizer.opt_batch_size DETOKENIZER.OPT_BATCH_SIZE
--detokenizer.preferred_batch_size DETOKENIZER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--detokenizer.batching_type DETOKENIZER.BATCHING_TYPE
--detokenizer.preserve_ordering DETOKENIZER.PRESERVE_ORDERING
Preserve ordering
--detokenizer.instance_group_count DETOKENIZER.INSTANCE_GROUP_COUNT
How many instances in a group
--detokenizer.max_queue_delay_microseconds DETOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
usage: riva-build qa [-h] [-f] [--language_code LANGUAGE_CODE]
[--max_batch_size MAX_BATCH_SIZE] [--model_api MODEL_API]
[--task TASK] [--IOB IOB] [--ignore_class IGNORE_CLASS]
[--start_token START_TOKEN] [--end_token END_TOKEN]
[--max_seq_length MAX_SEQ_LENGTH] [--vocab VOCAB]
[--tokenizer]
[--tokenizer.max_sequence_idle_microseconds TOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--tokenizer.max_batch_size TOKENIZER.MAX_BATCH_SIZE]
[--tokenizer.min_batch_size TOKENIZER.MIN_BATCH_SIZE]
[--tokenizer.opt_batch_size TOKENIZER.OPT_BATCH_SIZE]
[--tokenizer.preferred_batch_size TOKENIZER.PREFERRED_BATCH_SIZE]
[--tokenizer.batching_type TOKENIZER.BATCHING_TYPE]
[--tokenizer.preserve_ordering TOKENIZER.PRESERVE_ORDERING]
[--tokenizer.instance_group_count TOKENIZER.INSTANCE_GROUP_COUNT]
[--tokenizer.max_queue_delay_microseconds TOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS]
[--tokenizer.unk_token TOKENIZER.UNK_TOKEN]
[--tokenizer.to_lower TOKENIZER.TO_LOWER]
[--tokenizer.bos_token TOKENIZER.BOS_TOKEN]
[--tokenizer.eos_token TOKENIZER.EOS_TOKEN]
[--language_model.max_sequence_idle_microseconds LANGUAGE_MODEL.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--language_model.max_batch_size LANGUAGE_MODEL.MAX_BATCH_SIZE]
[--language_model.min_batch_size LANGUAGE_MODEL.MIN_BATCH_SIZE]
[--language_model.opt_batch_size LANGUAGE_MODEL.OPT_BATCH_SIZE]
[--language_model.preferred_batch_size LANGUAGE_MODEL.PREFERRED_BATCH_SIZE]
[--language_model.batching_type LANGUAGE_MODEL.BATCHING_TYPE]
[--language_model.preserve_ordering LANGUAGE_MODEL.PRESERVE_ORDERING]
[--language_model.instance_group_count LANGUAGE_MODEL.INSTANCE_GROUP_COUNT]
[--language_model.max_queue_delay_microseconds LANGUAGE_MODEL.MAX_QUEUE_DELAY_MICROSECONDS]
[--language_model.trt_max_workspace_size LANGUAGE_MODEL.TRT_MAX_WORKSPACE_SIZE]
[--token_classifier.max_sequence_idle_microseconds TOKEN_CLASSIFIER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--token_classifier.max_batch_size TOKEN_CLASSIFIER.MAX_BATCH_SIZE]
[--token_classifier.min_batch_size TOKEN_CLASSIFIER.MIN_BATCH_SIZE]
[--token_classifier.opt_batch_size TOKEN_CLASSIFIER.OPT_BATCH_SIZE]
[--token_classifier.preferred_batch_size TOKEN_CLASSIFIER.PREFERRED_BATCH_SIZE]
[--token_classifier.batching_type TOKEN_CLASSIFIER.BATCHING_TYPE]
[--token_classifier.preserve_ordering TOKEN_CLASSIFIER.PRESERVE_ORDERING]
[--token_classifier.instance_group_count TOKEN_CLASSIFIER.INSTANCE_GROUP_COUNT]
[--token_classifier.max_queue_delay_microseconds TOKEN_CLASSIFIER.MAX_QUEUE_DELAY_MICROSECONDS]
[--token_classifier.version_2_with_negative TOKEN_CLASSIFIER.VERSION_2_WITH_NEGATIVE]
[--token_classifier.n_best_size TOKEN_CLASSIFIER.N_BEST_SIZE]
[--token_classifier.max_answer_length TOKEN_CLASSIFIER.MAX_ANSWER_LENGTH]
output_path source_path [source_path ...]
Generate a Riva Model from a qa model trained with NVIDIA NeMo.
positional arguments:
output_path Location to write compiled Riva pipeline
source_path Source file(s)
optional arguments:
-h, --help show this help message and exit
-f, --force Overwrite existing artifacts if they exist
--language_code LANGUAGE_CODE
Language of the model
--max_batch_size MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--model_api MODEL_API
--task TASK Single Input or Multiple input (qa) task
--IOB IOB
--ignore_class IGNORE_CLASS
--start_token START_TOKEN
--end_token END_TOKEN
--max_seq_length MAX_SEQ_LENGTH
Max no. of tokens for an input query
--vocab VOCAB Path for the Language Model vocab file. If not
specified, default will be selected which is BERT base
vocab
--tokenizer Tokenizer type
tokenizer:
--tokenizer.max_sequence_idle_microseconds TOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--tokenizer.max_batch_size TOKENIZER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--tokenizer.min_batch_size TOKENIZER.MIN_BATCH_SIZE
--tokenizer.opt_batch_size TOKENIZER.OPT_BATCH_SIZE
--tokenizer.preferred_batch_size TOKENIZER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--tokenizer.batching_type TOKENIZER.BATCHING_TYPE
--tokenizer.preserve_ordering TOKENIZER.PRESERVE_ORDERING
Preserve ordering
--tokenizer.instance_group_count TOKENIZER.INSTANCE_GROUP_COUNT
How many instances in a group
--tokenizer.max_queue_delay_microseconds TOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--tokenizer.unk_token TOKENIZER.UNK_TOKEN
--tokenizer.to_lower TOKENIZER.TO_LOWER
--tokenizer.bos_token TOKENIZER.BOS_TOKEN
--tokenizer.eos_token TOKENIZER.EOS_TOKEN
language_model:
--language_model.max_sequence_idle_microseconds LANGUAGE_MODEL.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--language_model.max_batch_size LANGUAGE_MODEL.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--language_model.min_batch_size LANGUAGE_MODEL.MIN_BATCH_SIZE
--language_model.opt_batch_size LANGUAGE_MODEL.OPT_BATCH_SIZE
--language_model.preferred_batch_size LANGUAGE_MODEL.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--language_model.batching_type LANGUAGE_MODEL.BATCHING_TYPE
--language_model.preserve_ordering LANGUAGE_MODEL.PRESERVE_ORDERING
Preserve ordering
--language_model.instance_group_count LANGUAGE_MODEL.INSTANCE_GROUP_COUNT
How many instances in a group
--language_model.max_queue_delay_microseconds LANGUAGE_MODEL.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--language_model.trt_max_workspace_size LANGUAGE_MODEL.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
token_classifier:
--token_classifier.max_sequence_idle_microseconds TOKEN_CLASSIFIER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--token_classifier.max_batch_size TOKEN_CLASSIFIER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--token_classifier.min_batch_size TOKEN_CLASSIFIER.MIN_BATCH_SIZE
--token_classifier.opt_batch_size TOKEN_CLASSIFIER.OPT_BATCH_SIZE
--token_classifier.preferred_batch_size TOKEN_CLASSIFIER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--token_classifier.batching_type TOKEN_CLASSIFIER.BATCHING_TYPE
--token_classifier.preserve_ordering TOKEN_CLASSIFIER.PRESERVE_ORDERING
Preserve ordering
--token_classifier.instance_group_count TOKEN_CLASSIFIER.INSTANCE_GROUP_COUNT
How many instances in a group
--token_classifier.max_queue_delay_microseconds TOKEN_CLASSIFIER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--token_classifier.version_2_with_negative TOKEN_CLASSIFIER.VERSION_2_WITH_NEGATIVE
--token_classifier.n_best_size TOKEN_CLASSIFIER.N_BEST_SIZE
--token_classifier.max_answer_length TOKEN_CLASSIFIER.MAX_ANSWER_LENGTH
usage: riva-build punctuation [-h] [-f] [--language_code LANGUAGE_CODE]
[--max_batch_size MAX_BATCH_SIZE]
[--model_api MODEL_API] [--tokenizer]
[--vocab VOCAB] [--to_lower TO_LOWER]
[--tokenizer.max_sequence_idle_microseconds TOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--tokenizer.max_batch_size TOKENIZER.MAX_BATCH_SIZE]
[--tokenizer.min_batch_size TOKENIZER.MIN_BATCH_SIZE]
[--tokenizer.opt_batch_size TOKENIZER.OPT_BATCH_SIZE]
[--tokenizer.preferred_batch_size TOKENIZER.PREFERRED_BATCH_SIZE]
[--tokenizer.batching_type TOKENIZER.BATCHING_TYPE]
[--tokenizer.preserve_ordering TOKENIZER.PRESERVE_ORDERING]
[--tokenizer.instance_group_count TOKENIZER.INSTANCE_GROUP_COUNT]
[--tokenizer.max_queue_delay_microseconds TOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS]
[--tokenizer.unk_token TOKENIZER.UNK_TOKEN]
[--tokenizer.to_lower TOKENIZER.TO_LOWER]
[--tokenizer.bos_token TOKENIZER.BOS_TOKEN]
[--tokenizer.eos_token TOKENIZER.EOS_TOKEN]
[--language_model.max_sequence_idle_microseconds LANGUAGE_MODEL.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--language_model.max_batch_size LANGUAGE_MODEL.MAX_BATCH_SIZE]
[--language_model.min_batch_size LANGUAGE_MODEL.MIN_BATCH_SIZE]
[--language_model.opt_batch_size LANGUAGE_MODEL.OPT_BATCH_SIZE]
[--language_model.preferred_batch_size LANGUAGE_MODEL.PREFERRED_BATCH_SIZE]
[--language_model.batching_type LANGUAGE_MODEL.BATCHING_TYPE]
[--language_model.preserve_ordering LANGUAGE_MODEL.PRESERVE_ORDERING]
[--language_model.instance_group_count LANGUAGE_MODEL.INSTANCE_GROUP_COUNT]
[--language_model.max_queue_delay_microseconds LANGUAGE_MODEL.MAX_QUEUE_DELAY_MICROSECONDS]
[--language_model.trt_max_workspace_size LANGUAGE_MODEL.TRT_MAX_WORKSPACE_SIZE]
[--postprocessor.max_sequence_idle_microseconds POSTPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--postprocessor.max_batch_size POSTPROCESSOR.MAX_BATCH_SIZE]
[--postprocessor.min_batch_size POSTPROCESSOR.MIN_BATCH_SIZE]
[--postprocessor.opt_batch_size POSTPROCESSOR.OPT_BATCH_SIZE]
[--postprocessor.preferred_batch_size POSTPROCESSOR.PREFERRED_BATCH_SIZE]
[--postprocessor.batching_type POSTPROCESSOR.BATCHING_TYPE]
[--postprocessor.preserve_ordering POSTPROCESSOR.PRESERVE_ORDERING]
[--postprocessor.instance_group_count POSTPROCESSOR.INSTANCE_GROUP_COUNT]
[--postprocessor.max_queue_delay_microseconds POSTPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS]
[--postprocessor.punctuation_mapping_path POSTPROCESSOR.PUNCTUATION_MAPPING_PATH]
[--postprocessor.capitalization_mapping_path POSTPROCESSOR.CAPITALIZATION_MAPPING_PATH]
output_path source_path [source_path ...]
Generate a Riva Model from a punctuation model trained with NVIDIA NeMo.
positional arguments:
output_path Location to write compiled Riva pipeline
source_path Source file(s)
optional arguments:
-h, --help show this help message and exit
-f, --force Overwrite existing artifacts if they exist
--language_code LANGUAGE_CODE
Language of the model
--max_batch_size MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--model_api MODEL_API
--tokenizer Tokenizer type
--vocab VOCAB Path for the Language Model vocab file. If not
specified, default will be selected which is BERT base
vocab
--to_lower TO_LOWER
tokenizer:
--tokenizer.max_sequence_idle_microseconds TOKENIZER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--tokenizer.max_batch_size TOKENIZER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--tokenizer.min_batch_size TOKENIZER.MIN_BATCH_SIZE
--tokenizer.opt_batch_size TOKENIZER.OPT_BATCH_SIZE
--tokenizer.preferred_batch_size TOKENIZER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--tokenizer.batching_type TOKENIZER.BATCHING_TYPE
--tokenizer.preserve_ordering TOKENIZER.PRESERVE_ORDERING
Preserve ordering
--tokenizer.instance_group_count TOKENIZER.INSTANCE_GROUP_COUNT
How many instances in a group
--tokenizer.max_queue_delay_microseconds TOKENIZER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--tokenizer.unk_token TOKENIZER.UNK_TOKEN
--tokenizer.to_lower TOKENIZER.TO_LOWER
--tokenizer.bos_token TOKENIZER.BOS_TOKEN
--tokenizer.eos_token TOKENIZER.EOS_TOKEN
language_model:
--language_model.max_sequence_idle_microseconds LANGUAGE_MODEL.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--language_model.max_batch_size LANGUAGE_MODEL.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--language_model.min_batch_size LANGUAGE_MODEL.MIN_BATCH_SIZE
--language_model.opt_batch_size LANGUAGE_MODEL.OPT_BATCH_SIZE
--language_model.preferred_batch_size LANGUAGE_MODEL.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--language_model.batching_type LANGUAGE_MODEL.BATCHING_TYPE
--language_model.preserve_ordering LANGUAGE_MODEL.PRESERVE_ORDERING
Preserve ordering
--language_model.instance_group_count LANGUAGE_MODEL.INSTANCE_GROUP_COUNT
How many instances in a group
--language_model.max_queue_delay_microseconds LANGUAGE_MODEL.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--language_model.trt_max_workspace_size LANGUAGE_MODEL.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
postprocessor:
--postprocessor.max_sequence_idle_microseconds POSTPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--postprocessor.max_batch_size POSTPROCESSOR.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--postprocessor.min_batch_size POSTPROCESSOR.MIN_BATCH_SIZE
--postprocessor.opt_batch_size POSTPROCESSOR.OPT_BATCH_SIZE
--postprocessor.preferred_batch_size POSTPROCESSOR.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--postprocessor.batching_type POSTPROCESSOR.BATCHING_TYPE
--postprocessor.preserve_ordering POSTPROCESSOR.PRESERVE_ORDERING
Preserve ordering
--postprocessor.instance_group_count POSTPROCESSOR.INSTANCE_GROUP_COUNT
How many instances in a group
--postprocessor.max_queue_delay_microseconds POSTPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--postprocessor.punctuation_mapping_path POSTPROCESSOR.PUNCTUATION_MAPPING_PATH
--postprocessor.capitalization_mapping_path POSTPROCESSOR.CAPITALIZATION_MAPPING_PATH
Pretrained Models¶
Task |
Architecture |
Language |
Dataset |
Domain |
Accuracy |
Compatibility with TAO Toolkit 3.0-21.08 |
Compatibility with Nemo 1.0.0b4 |
Link |
---|---|---|---|---|---|---|---|---|
QA |
BERT |
English |
SQuAD 2.0 |
EM: 71.24 F1: 74.32 |
Yes |
Yes |
||
QA |
Megatron |
English |
SQuAD 2.0 |
TBM |
Yes |
Yes |
||
Entity Recognition |
BERT |
English |
GMB (Groningen Meaning Bank) |
LOC, ORG, PER, GPE, TIME, MISC, O |
Yes |
Yes |
||
Punctuation/Capitalization |
BERT |
English |
Tatoeba sentences, Books from the Project Gutenberg that were used as part of the LibriSpeech corpus, Transcripts from Fisher English Training Speech |
Yes |
Yes |
|||
Intent Detection & Slot Tagging |
BERT |
English |
Proprietary |
Weather |
Yes |
Yes |
||
Text Classification |
BERT |
English |
Proprietary |
Yes |
Yes |
The aforementioned NLP tasks leverage large scale pretrained models such as BERT-base, BERT-large, and Megatron-LM BERT. Here are pointers to the respective pretrained models hosted on NGC.