Deploying Your Custom Model into Jarvis

This section provides a brief overview on the two main tools used in the deploy process:

  1. The build phase using jarvis-build.

  2. The deploy phase using jarvis-deploy.

Build process

For your custom trained model, refer to the corresponding section (ASR, NLP, TTS) for your model type for the jarvis-build phase. At the end of this phase, you would have the Jarvis Model Intermediate Representation (JMIR) archive for your custom model.

Deploy process

At this point, you already have your Jarvis Model Intermediate Representation (JMIR) archive. Now, you have two options for deploying this JMIR.

Option 1: Use the Quick Start scripts (jarvis_init.sh and jarvis_start.sh) with the appropriate parameters in config.sh,

Option 2: Manually run jarvis-deploy and then start jarvis-server with the target model repo.

Option 2: Using jarvis-deploy and the Jarvis Speech Container (Advanced)

  1. Execute jarvis-deploy. Refer to the Deploy section in Services and Models > Overview for a brief overview on jarvis-deploy.

    The above command will create the Triton Inference Server model repository at /data/models. If you want to write to any other location other than /data/models, this will require some more manual changes in the embedded artifact directories within the configs within some of the Triton Inference Server model repositories that has model specific artifacts such as class labels. Therefore, stick with /data/models unless you are familiar with Triton Inference Server Model repository configurations.

  2. Manually start the jarvis-server Docker container using docker run.

    After the Triton Inference Server model repository for your custom model is generated, start the Jarvis server on that target repo. The following command assumes you generated the model repo at /data/models.

    docker run -d --gpus 1 --init --shm-size=1G --ulimit memlock=-1 --ulimit stack=67108864 \
            -v /data:/data             \
            -p 50051                            \
            -e '\''CUDA_VISIBLE_DEVICES=0'\''   \
            --name jarvis-speech                \
            jarvis-api                          \
            start-jarvis  --jarvis-uri=0.0.0.0:50051 --nlp_service=true --asr_service=true --tts_service=true
    

    This will launch the Jarvis Speech Service API server similar to the Quick Start script jarvis_start.sh.

    Example output:

    Starting Jarvis Speech Services
    > Waiting for Triton server to load all models...retrying in 10 seconds
    > Waiting for Triton server to load all models...retrying in 10 seconds
    > Waiting for Triton server to load all models...retrying in 10 seconds
    > Triton server is ready…
    
  3. Verify that the servers have started correctly and check that the output of docker logs jarvis-speech shows:

    I0428 03:14:50.440943 1 jarvis_server.cc:66] TTS Server connected to Triton Inference Server at 0.0.0.0:8001
    I0428 03:14:50.440943 1 jarvis_server.cc:66] NLP Server connected to Triton Inference Server at 0.0.0.0:8001
    I0428 03:14:50.440951 1 jarvis_server.cc:68] ASR Server connected to Triton Inference Server at 0.0.0.0:8001
    I0428 03:14:50.440955 1 jarvis_server.cc:71] Jarvis Conversational AI Server listening on 0.0.0.0:50051