Overview ========== Jarvis handles deployments of full pipelines, which could be composed of one or more `NVIDIA Transfer Learning Toolkit (TLT) `_ models and other pre-/post-processing components. Additionally, the TLT models have to be exported to an efficient inference engine and optimized for the target platform. Therefore, the Jarvis server cannot use `NVIDIA NeMo `_ or TLT models *directly* because they represent *only* a single model. The process of gathering all the required artifacts (for example, models, files, configurations and user settings) and generating the inference engines, will be referred to as the Jarvis model repository generation. The Jarvis ServiceMaker Docker image has all the tools necessary to generate the Jarvis model repository and can be pulled from NGC as follows: .. code-block:: bash :substitutions: docker pull nvcr.io/|NgcOrgTeam|/jarvis-service-maker:|VersionNum| The Jarvis model repository generation is done in two phases: **Phase 1:** The build phase. During the build, phase all the necessary artifacts (models, files, configurations, and user settings) required to deploy a Jarvis service are gathered together into an intermediate file called JMIR (Jarvis Model Intermediate Representation). For more information, continue to the next section. **Phase 2:** The deploy phase. During the deploy phase, the JMIR file is converted into the Jarvis model repository and the neural networks in TLT or NeMo format are exported and optimized to run on the target platform. The deploy phase should be executed on the physical cluster on which the Jarvis server will be deployed. For more information, refer to the :ref:`jarvis_deploy` section. .. _jarvis_build: Jarvis-Build =============== The ``jarvis-build`` tool is responsible for deployment preparation. It's only output is an intermediate format (called a JMIR) of an end-to-end pipeline for the supported services within Jarvis. The tool can take multiple different types of models as inputs. Currently, the following pipelines are supported: - ``speech_recognition`` (for ASR) - ``speech_synthesis`` (for TTS) - ``qa`` (for question answering) - ``token_classification`` (for Named Entity Recognition) - ``intent_slot`` (for joint intent and slot classification) - ``text_classification`` - ``punctuation`` #. Run the ``jarvis-build`` tool and launch an interactive session inside the Jarvis ServiceMaker image. .. code-block:: bash :substitutions: docker run --gpus all -it --rm -v :/servicemaker-dev -v :/data --entrypoint="/bin/bash" nvcr.io/|NgcOrgTeam|/jarvis-service-maker:|VersionNum| where: - ```` is the folder or Docker volume that contains the Jarvis ``.ejrvs`` file and other artifacts required to prepare the Jarvis model repository. - ```` is the folder or Docker volume where the Jarvis model repository will be generated. #. Run the ``jarvis-build`` command from within the container. .. code-block:: bash :substitutions: jarvis-build /servicemaker-dev/: /servicemaker-dev/: where: - ```` must be one of the following: - ``speech_recognition`` - ``speech_synthesis`` - ``qa`` - ``token_classification`` - ``intent_slot`` - ``text_classification`` - ```` is the name of the JMIR file that will be generated. - ```` is the name of the ``ejrvs`` file(s) to use as input. - ```` are optional arguments that can be used to configure the Jarvis service. The next section will cover the different ways the ASR, NLP and TTS services can be configured. - ```` is optional. In the case where the ``.ejrvs`` file was generated without an encryption key, the input/output files can be specified with ```` instead of ``:``. .. include:: model-deployment-asr.rst .. include:: model-deployment-nlp.rst .. include:: model-deployment-tts.rst .. _jarvis_deploy: Jarvis-Deploy ============== The ``jarvis-deploy`` tool takes as input one or more Jarvis Model Intermediate Representation (JMIR) files and a target model repository directory. It is responsible for performing two functions: **Function 1:** Generates a Triton ensemble, Triton configuration files, and writes them to the target model repository directory as necessary. **Function 2:** Runs any commands (if specified) by the JMIR to perform the model conversion for TensorRT, and updates the configuration mapping for GPU compute capability to artifact. The Jarvis model repository can be generated from the Jarvis ``.jmir`` file(s) with the following command: .. code-block:: bash :substitutions: jarvis-deploy /servicemaker-dev/: /data/models .. Direct NeMo to Jarvis ServiceMaker (no TLT): ------------------------------------------- #. Generate ONNX from your ``.nemo`` file using the `convasr_to_enemo.py `_ script. .. code-block:: bash :substitutions: python convasr_to_enemo.py --nemo_file=/NeMo/QuartzNet15x5Base-En.nemo --onnx_file=output/quartz.onnx --enemo_file=/NeMo/quartznet_asr.enemo #. Follow the :ref:`jarvis_build` documentation to use ``quartznet_asr.enemo`` instead of ``.ejrvs`` for the build phase.