[1]:
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

Python API Examples

This notebook walks through the basics of the Jarvis Speech and Language AI Services.

Overview

NVIDIA Jarvis is a platform for building and deploying AI applications that fuse vision, speech and other sensors. It offers a complete workflow to build, train and deploy AI systems that can use visual cues such as gestures and gaze along with speech in context. With the Jarvis platform, you can:

  • Build speech and visual AI applications using pretrained NVIDIA Neural Modules (NeMo) available at NVIDIA GPU Cloud (NGC).

  • Transfer learning: re-train your model on domain-specific data, with NVIDIA NeMo. NeMo is a toolkit and platform that enables researchers to define and build new state-of-the-art speech and natural language processing models.

  • Optimize neural network performance and latency using NVIDIA TensorRT

  • Deploy AI applications with TensorRT Inference Server:

    • Support multiple network formats: ONNX, TensorRT plans, PyTorch TorchScript models.

    • Deployement on multiple platforms: from datacenter to edge servers, via Helm to K8s cluster, on NVIDIA Volta/Turing GPUs or Jetson Xavier platforms.

See the below video for a demo of Jarvis capabilities.

[2]:
from IPython.display import IFrame

# Jarvis Youtube demo video
IFrame("https://www.youtube.com/embed/r264lBi1nMU?rel=0&controls=0&showinfo=0", width="560", height="315", frameborder="0", allowfullscreen=True)
[2]:

For more detailed information on Jarvis, please refer to the Jarvis developer documentation.

Introduction the Jarvis Speech and Natural Languages services

Jarvis offers a rich set of speech and natural language understanding services such as:

  • Automated speech recognition (ASR)

  • Text-to-Speech synthesis (TTS)

  • A collection of natural language understanding services such as named entity recognition (NER), punctuation, intent classification.

Learning objectives

  • Understand how interact with Jarvis Speech and Natural Languages APIs, services and use cases

Requirements and setup

To execute this notebook, please follow the setup steps in README.

We first generate some required libraries.

[3]:
import io
import librosa
from time import time
import numpy as np
import IPython.display as ipd
import grpc
import requests

# NLP proto
import jarvis_api.jarvis_nlp_core_pb2 as jcnlp
import jarvis_api.jarvis_nlp_core_pb2_grpc as jcnlp_srv
import jarvis_api.jarvis_nlp_pb2 as jnlp
import jarvis_api.jarvis_nlp_pb2_grpc as jnlp_srv

# ASR proto
import jarvis_api.jarvis_asr_pb2 as jasr
import jarvis_api.jarvis_asr_pb2_grpc as jasr_srv

# TTS proto
import jarvis_api.jarvis_tts_pb2 as jtts
import jarvis_api.jarvis_tts_pb2_grpc as jtts_srv
import jarvis_api.audio_pb2 as ja

Create Jarvis clients and connect to Jarvis Speech API server

The below URI assumes a local deployment of the Jarvis Speech API server on the default port. In case the server deployment is on a different host or via Helm chart on Kubernetes, the user should use an appropriate URI.

[4]:
channel = grpc.insecure_channel('localhost:50051')

jarvis_asr = jasr_srv.JarvisASRStub(channel)
jarvis_nlp = jnlp_srv.JarvisNLPStub(channel)
jarvis_cnlp = jcnlp_srv.JarvisCoreNLPStub(channel)
jarvis_tts = jtts_srv.JarvisTTSStub(channel)

1. Offline ASR Example

Jarvis Speech API supports .wav files in PCM format, .alaw, .mulaw and .flac formats with single channel in this release.

[5]:
# This example uses a .wav file with LINEAR_PCM encoding.
# read in an audio file from local disk
path = "/work/wav/sample.wav"
audio, sr = librosa.core.load(path, sr=None)
with io.open(path, 'rb') as fh:
    content = fh.read()
ipd.Audio(path)
[5]:
[6]:
# Set up an offline/batch recognition request
req = jasr.RecognizeRequest()
req.audio = content                                   # raw bytes
req.config.encoding = ja.AudioEncoding.LINEAR_PCM     # Supports LINEAR_PCM, FLAC, MULAW and ALAW audio encodings
req.config.sample_rate_hertz = sr                     # Audio will be resampled if necessary
req.config.language_code = "en-US"                    # Ignored, will route to correct model in future release
req.config.max_alternatives = 1                       # How many top-N hypotheses to return
req.config.enable_automatic_punctuation = True        # Add punctuation when end of VAD detected
req.config.audio_channel_count = 1                    # Mono channel

response = jarvis_asr.Recognize(req)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript:", asr_best_transcript)

print("\n\nFull Response Message:")
print(response)
ASR Transcript: What is natural language processing?


Full Response Message:
results {
  alternatives {
    transcript: "What is natural language processing? "
    confidence: -8.908161163330078
  }
  channel_tag: 1
  audio_processed: 6.400000095367432
}

2. Core NLP Service Examples

All of the Core NLP Services support batched requests. The maximum batch size, if any, of the underlying models is hidden from the end user and automatically batched by the Jarvis and TRTIS servers.

The Core NLP API provides three methods currently:

  1. TransformText - map an input string to an output string

  2. ClassifyText - return a single label for the input string

  3. ClassifyTokens - return a label per input token

[7]:
# Use the TextTransform API to run the punctuation model
req = jcnlp.TextTransformRequest()
req.model.model_name = "jarvis_punctuation"
req.text.append("add punctuation to this sentence")
req.text.append("do you have any red nvidia shirts")
req.text.append("i need one cpu four gpus and lots of memory "
                "for my new computer it's going to be very cool")

nlp_resp = jarvis_cnlp.TransformText(req)
print("TransformText Output:")
print("\n".join([f" {x}" for x in nlp_resp.text]))
TransformText Output:
 Add punctuation to this sentence.
 Do you have any red Nvidia shirts?
 I need one cpu, four gpus and lots of memory for my new computer. It's going to be very cool.
[8]:
# Use the TokenClassification API to run a Named Entity Recognition (NER) model
# Note: the model configuration of the NER model indicates that the labels are
# in IOB format. Jarvis, subsequently, knows to:
#   a) ignore 'O' labels
#   b) Remove B- and I- prefixes from labels
#   c) Collapse sequences of B- I- ... I- tokens into a single token

req = jcnlp.TokenClassRequest()
req.model.model_name = "jarvis_ner"     # If you have deployed a custom model with the domain_name
                                        # parameter in ServiceMaker's `jarvis-build` command then you should use
                                        # "jarvis_ner_<your_input_domain_name>" where <your_input_domain_name>
                                        # is the name you provided to the domain_name parameter.

req.text.append("Jensen Huang is the CEO of NVIDIA Corporation, "
                "located in Santa Clara, California")
resp = jarvis_cnlp.ClassifyTokens(req)

print("Named Entities:")
for result in resp.results[0].results:
    print(f"  {result.token} ({result.label[0].class_name})")
Named Entities:
  jensen huang (PER)
  nvidia corporation (ORG)
  santa clara (LOC)
  california (LOC)
[9]:
# Submit a TextClassRequest for text classification.
# Jarvis NLP comes with a default text_classification domain called "domain_misty" which consists of
# 4 classes: meteorology, personality, weather and nomatch

request = jcnlp.TextClassRequest()
request.model.model_name = "jarvis_text_classification_domain"       # If you have deployed a custom model
                                        # with the `--domain_name` parameter in ServiceMaker's `jarvis-build` command
                                        # then you should use "jarvis_text_classification_<your_input_domain_name>"
                                        # where <your_input_domain_name> is the name you provided to the
                                        # domain_name parameter. In this case the domain_name is "domain"
request.text.append("Is it going to snow in Burlington, Vermont tomorrow night?")
request.text.append("What causes rain?")
request.text.append("What is your favorite season?")
ct_response = jarvis_cnlp.ClassifyText(request)
print(ct_response)
results {
  labels {
    class_name: "weather"
    score: 0.9975590109825134
  }
}
results {
  labels {
    class_name: "meteorology"
    score: 0.984375
  }
}
results {
  labels {
    class_name: "personality"
    score: 0.984375
  }
}

3. TTS Service Example

Subsequent releases will include added features, including model registration to support multiple languages/voices with the same API. Support for resampling to alternative sampling rates will also be added.

[10]:
req = jtts.SynthesizeSpeechRequest()
req.text = "Is it recognize speech or wreck a nice beach?"
req.language_code = "en-US"                    # currently required to be "en-US"
req.encoding = ja.AudioEncoding.LINEAR_PCM     # Supports LINEAR_PCM, FLAC, MULAW and ALAW audio encodings
req.sample_rate_hz = 22050                     # ignored, audio returned will be 22.05KHz
req.voice_name = "ljspeech"                    # ignored

resp = jarvis_tts.Synthesize(req)
audio_samples = np.frombuffer(resp.audio, dtype=np.float32)
ipd.Audio(audio_samples, rate=22050)
[10]:

4. Jarvis NLP Service Examples

The NLP Service contains higher-level/more application-specific NLP APIs. This guide demonstrates how the AnalyzeIntent API can be used for queries across both known and unknown domains.

[11]:
# The AnalyzeIntent API can be used to query a Intent Slot classifier. The API can leverage a
# text classification model to classify the domain of the input query and then route to the
# appropriate intent slot model.

# Lets first see an example where the domain is known. This skips execution of the domain classifier
# and proceeds directly to the intent/slot model for the requested domain.

req = jnlp.AnalyzeIntentRequest()
req.query = "How is the humidity in San Francisco?"
req.options.domain = "weather"  # The <domain_name> is appended to "jarvis_intent_" to look for a
                                # model "jarvis_intent_<domain_name>". So in this e.g., the model "jarvis_intent_weather"
                                # needs to be preloaded in jarvis server. If you would like to deploy your
                                # custom Joint Intent and Slot model use the `--domain_name` parameter in
                                # ServiceMaker's `jarvis-build intent_slot` command.

resp = jarvis_nlp.AnalyzeIntent(req)
print(resp)
intent {
  class_name: "weather.humidity"
  score: 0.983601987361908
}
slots {
  token: "san francisco"
  label {
    class_name: "weatherplace"
    score: 0.9822959899902344
  }
}
slots {
  token: "?"
  label {
    class_name: "weatherplace"
    score: 0.6474800109863281
  }
}
domain_str: "weather"
domain {
  class_name: "weather"
  score: 1.0
}

[12]:
# Below is an example where the input domain is not provided.

req = jnlp.AnalyzeIntentRequest()
req.query = "Is it going to rain tomorrow?"

        # The input query is first routed to the a text classification model called "jarvis_text_classification_domain"
        # The output class label of "jarvis_text_classification_domain" is appended to "jarvis_intent_"
        # to get the appropriate Intent Slot model to execute for the input query.
        # Note: The model "jarvis_text_classification_domain" needs to be loaded into Jarvis server and have the appropriate
        # class labels that would invoke the corresponding intent slot model.

resp = jarvis_nlp.AnalyzeIntent(req)
print(resp)
intent {
  class_name: "weather.rainfall"
  score: 0.9661880135536194
}
slots {
  token: "tomorrow"
  label {
    class_name: "weatherforecastdaily"
    score: 0.5325539708137512
  }
}
slots {
  token: "?"
  label {
    class_name: "weatherplace"
    score: 0.6895459890365601
  }
}
domain_str: "weather"
domain {
  class_name: "weather"
  score: 0.9975590109825134
}

[13]:
# Some weather Intent queries
queries = [
    "Is it currently cloudy in Tokyo?",
    "What is the annual rainfall in Pune?",
    "What is the humidity going to be tomorrow?"
]
for q in queries:
    req = jnlp.AnalyzeIntentRequest()
    req.query = q
    start = time()
    resp = jarvis_nlp.AnalyzeIntent(req)

    print(f"[{resp.intent.class_name}]\t{req.query}")
[weather.cloudy]        Is it currently cloudy in Tokyo?
[weather.rainfall]      What is the annual rainfall in Pune?
[weather.humidity]      What is the humidity going to be tomorrow?
[14]:
# Demonstrate latency by calling repeatedly.
# NOTE: this is a synchronous API call, so request #N will not be sent until
# response #N-1 is returned. This means latency and throughput will be negatively
# impacted by long-distance & VPN connections

req = jcnlp.TextTransformRequest()
req.text.append("i need one cpu four gpus and lots of memory for my new computer it's going to be very cool")

iterations = 10
# Demonstrate synchronous performance
start_time = time()
for _ in range(iterations):
    nlp_resp = jarvis_nlp.PunctuateText(req)
end_time = time()
print(f"Time to complete {iterations} synchronous requests: {end_time-start_time}")

# Demonstrate async performance
start_time = time()
futures = []
for _ in range(iterations):
    futures.append(jarvis_nlp.PunctuateText.future(req))
for f in futures:
    f.result()
end_time = time()
print(f"Time to complete {iterations} asynchronous requests: {end_time-start_time}\n")

Time to complete 10 synchronous requests: 0.05957150459289551
Time to complete 10 asynchronous requests: 0.020952463150024414

5. Go deeper into Jarvis capabilities

Now that you have a basic introduction to the Jarvis APIs, you may like to try out:

1. Sample apps:

Jarvis comes with various sample apps as a demonstration for how to use the APIs to build interesting applications such as a chatbot, a domain specific speech recognition or keyword (entity) recognition system, or simply how Jarvis allows scaling out for handling massive amount of requests at the same time. (SpeechSquad) Have a look at the Sample Application section in the Jarvis developer documentation for all the sample apps.

3. Finetune your own domain specific Speech or NLP model and deploy into Jarvis.

Train the latest state-of-the-art speech and natural language processing models on your own data using NeMo or Transfer Learning ToolKit and deploy them on Jarvis using the Jarvis ServiceMaker tool.

3. Further resources:

Explore the details of each of the APIs and their functionalities in the docs.