Getting Started

Prerequisites

Please follow instructions in Clara installation to install the docker image and start the container.

Additionally you would like to mount some shared disk/folder for saving all the models, logs and configurations for AIAA server to persist. This will help you to start/stop the container any time without losing the models/configurations for AIAA.

For example:

export NVIDIA_RUNTIME="--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0"
export OPTIONS="--shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864"
export LOCAL_WORKSPACE=/your/shared_disk
export REMOTE_WORKSPACE=/workspace
export DOCKER_IMAGE="nvcr.io/clara-train-sdk:[version here]"

docker run $NVIDIA_RUNTIME  $OPTIONS -it --rm \
       --ipc=host \
       --net=host \
       -v $LOCAL_WORKSPACE:$REMOTE_WORKSPACE  \
       $DOCKER_IMAGE \
       /bin/bash

Attention

The system requirements of AIAA depends on how many models you want to load in the server. If your models are big and you don’t have enough system RAM/GPU memory, please load one model at a time.

Running AIAA

The simple command to start AI-Assisted Annotation server inside the container is: start_aas.sh However, following are the options available while starting AI-Assisted Annotation Server.

Option

Default Value

Example

Description

BASIC

workspace

/var/nvidia/aiaa

–workspace /workspace/aiaa

Workspace Location for AIAA Server to save all the configurations, logs and models. Hence always recommended to to use shared workspace while starting AIAA

port

5000

–port 5678

HTTP Server listening port for AIAA Server

monitor

true

–monitor true

If API instrumentation is allowed. If enabled, you can view monitor dashboard by visiting http://host:port/dashboard

debug

false

–debug true

Enable Debug level Logging for AIAA Server logs

auto_reload

false

–auto_reload true

AIAA Server will auto-reload models whenever the configuration is updated externally. This option should be enabled, when multiple AIAA Server are installed and sharing a common workspace

admin_hosts

*

–admin_hosts 10.20.1.23,122.34.1.3.4

Restrict client hosts who can send /admin requests to AIAA server (for e.g. managing models). * (star) will allow any; otherwise provide a list of client IP addresses/prefixes to allow

SSL Support

ssl

false

–ssl true

Run AIAA server in ssl mode

ssl_cert_file

–ssl_cert_file server.cert

SSL Certificate File Path

ssl_pkey_file

–ssl_pkey_file server.key

SSL Key File Path

TF/TRTIS

engine

TRTIS

–engine AIAA

Use this option to run AIAA server using Tensorflow (AIAA) or with TRTIS as background inference engine

trtis_ip

localhost

–trtis_ip 10.18.2.13

If anything other than 127.0.0.1 or localhost is specified, AIAA will not start a local TRTIS server for inference. Otherwise AIAA will start a local instance of TRTIS server

trtis_port

8001

–trtis_port 9001

TRTIS server port (http: 8000, grpc: 8001; monitor: 8002)

trtis_proto

grpc

TRTIS server protocol (http or grpc)

trtis_model_path

/workspace/aiaa/trtis_models

TRTIS models path

trtis_start_timeout

120

Wait time in seconds for AIAA to make sure TRTIS server is up and running

trtis_model_timeout

30

Wait time in seconds for AIAA to make sure model is loaded and up for serving in TRTIS

FineTune

fine_tune

false

–fine_tune true

If set to true, will fine tune the models in AIAA automatically based on samples directory (Model Fine-Tune Setup)

fine_tune_hour

0

–fine_tune_hour 1

The scheduled time (0~23) in each day to fine tune all the models

OTHERS

client_id

host-5000

–client_id aiaa-instance-1

If you are deploying AIAA Server on multi node/AWS with shared disk, you might need this to identify each AIAA server accordingly

Note

TRTIS stands for the NVIDIA TensorRT Inference Server, which provides a cloud inferencing solution optimized for NVIDIA GPUs. More information can be found in their github page.

Examples

# Run with workspace (AIAA Server will be running at: http://0.0.0.0:5000/)
start_aas.sh --workspace /workspace/aiaa

# Run at different port (AIAA Server will be running at: http://0.0.0.0:5678/)
start_aas.sh --workspace /workspace/aiaa --port 5678

# Run AIAA to run TF Inference engine
start_aas.sh --workspace /workspace/aiaa --engine AIAA

# Run AIAA to run TRTIS Inference engine (TRTIS will be started at 8001 on localhost)
start_aas.sh --workspace /workspace/aiaa --engine TRTIS

# Run AIAA to connect to remote TRTIS Inference engine (Debug only; Some options may not work)
start_aas.sh --workspace /workspace/aiaa --engine TRTIS --trtis_ip 10.20.11.34 --trtis_model_path /shared/trtis_models

Troubleshooting

Workspace

AIAA Server uses workspace directory (if not specified default shall be insider docker at the path: /var/nvidia/aiaa) for saving configs, models (Frozen), logs etc. You can shutdown and run the docker multiple times, if you mount an external workspace path while running the docker. See the advanced options while running the docker.

Following are few files/folders explained from workspace directory.

File/Folder

Description

aiaa_server_config.json

INTERNAL - AIAA Server supports automatic loading of models and corresponding configs are saved in this file

aiaa_server_dashboard.db

INTERNAL - AIAA Web/API activities are monitored through Flask Dashboard and corresponding data is saved here

downloads/

INTERNAL - Temporary downloads from NGC happens here and temporary data is removed after successful import of model into AIAA Server

logs/

INTERNAL - AIAA Server logs are stored over here

models/

INTERNAL - All serving models in Frozen Format are stored here

trtis_models/

INTERNAL - All serving TRTIS models are stored here unless user provides a different trtis_model_path while starting AIAA Server

transforms/

EXTERNAL - Add your custom transforms (python) here and use them in aiaa_config.json for a model (Bring your own Transforms to AIAA)

samples/{model}

EXTERNAL - Add your train samples here to trigger an incremental training for a model with new samples (Model Fine-Tune Setup)

mmars/{model}

INTERNAL/EXTERNAL - Put your MMAR here to trigger an incremental training for a model with new samples. Also AIAA server archives any imported MMARs here

Note

We suggest you create a new folder to be the workspace so that if you want to delete these files you can just remove that workspace folder.

Logs

Once the server is up and running, you can watch or pull the server logs in browser.

Tip

You don’t have to be in bash mode to see the logs files.

Monitor

To track API usage, profiling etc, developers can login and access monitoring dashboard for the AIAA Server.

Tip

Login Username: admin; Default password: admin

Note

All the URL with 0.0.0.0 is only accessible if your browser and server are on the same machine. You can use ip a to find your server’s IP address and check it remotely on http://[server_ip]:[port]/logs.