NVIDIA Clara Train 4.1
1.0

Getting Started

Please follow instructions in Clara installation to install the docker image.

Additionally, you need to create and mount a folder for the AIAA server to save all the models, logs and configurations.

Copy
Copied!
            

mkdir aiaa_experiments

Attention

The system requirements of AIAA depend on how many models you want to load in the server. If your models are big and you don’t have enough system RAM/GPU memory, please load one model at a time.

AIAA supports the following 2 modes.

NATIVE (PyTorch)

Copy
Copied!
            

start_aiaa.sh -w aiaa_experiments --engine AIAA


TRITON

To run AIAA with the Triton backend, we need to run two containers. We recommend using docker-compose to start AIAA.

Note

Please follow docker-compose installation guide to install docker-compose. And also check docker GPU support prerequisites

You can install docker-compose using pip. For example: pip install docker-compose

Attention

docker-compose version needs to be >= 1.28.5. Use docker-compose version to check.

Create a file called docker-compose.yml and paste the following contents:

Copy
Copied!
            

version: "3.8" services: clara-train-sdk: image: ${CLARA_IMAGE} command: > sh -c "mkdir -p /workspace/logs /workspace/triton && start_aiaa.sh \ --workspace /workspace \ --port ${AIAA_PORT} \ --engine TRITON \ --triton_ip tritonserver \ --triton_port 8000 \ --triton_proto http \ --triton_model_path /workspace/triton \ --triton_verbose 0 2>&1 | tee /workspace/logs/container.aiaa.log" ports: - "${AIAA_PORT}:5000" volumes: - ${AIAA_WORKSPACE}:/workspace networks: - aiaa shm_size: 1gb ulimits: memlock: -1 stack: 67108864 depends_on: - tritonserver logging: driver: json-file tritonserver: image: ${TRITION_IMAGE} command: > sh -c "mkdir -p /workspace/logs /workspace/triton && tritonserver \ --model-store=/workspace/triton \ --model-control-mode=poll \ --repository-poll-secs=5 \ --log-verbose=0 2>&1 | tee /workspace/logs/container.triton.log" volumes: - ${AIAA_WORKSPACE}:/workspace networks: - aiaa shm_size: 1gb ulimits: memlock: -1 stack: 67108864 restart: unless-stopped logging: driver: json-file deploy: resources: reservations: devices: - driver: nvidia device_ids: [ '0' ] capabilities: [ gpu ] networks: aiaa:

Create a file called docker-compose.env and paste the following contents:

Copy
Copied!
            

CLARA_IMAGE=nvcr.io/nvidia/clara-train:4.1 TRITION_IMAGE=nvcr.io/nvidia/tritonserver:21.10-pyt-python-py3 # Basic AIAA settings AIAA_PORT=5000 AIAA_WORKSPACE=/share/aiaa_workspace # Local Path (fix/update this)

Please modify the <YOUR WORKSPACE> to the absolute path on your system where you want to use it as AIAA workspace (the folder you created in this step.)

Then we can use the following command to run:

Copy
Copied!
            

docker-compose --env-file docker-compose.env -p aiaa_triton up --remove-orphans -d

And this is the command to stop:

Copy
Copied!
            

docker-compose --env-file docker-compose.env -p aiaa_triton down --remove-orphans

To check the log you can do:

Copy
Copied!
            

docker-compose --env-file docker-compose.env -p aiaa_triton logs -f -t

Application logs are redirected to **workspace/logs/container.xxx.log**

To start only one service you can do

Copy
Copied!
            

docker-compose --env-file docker-compose.env -p aiaa_triton up [the service name]

You can type docker ps to check if AIAA and Triton is running.

Note

You can modify the “AIAA_PORT” to the port you want to use on the host machine.

Note

You can modify the “device_ids” section under “deploy” section of “tritonserver” service to change the GPU id that you want to use.


Following are the options available while starting the AI-Assisted Annotation Server.

Option

Default Value

Example

Description

BASIC

workspace

Workspace Location for AIAA Server to save all the configurations, logs and models. It is required while starting AIAA

debug

--debug

Enable Debug level Logging for AIAA Server logs

Native/Triton

engine

AIAA

--engine AIAA

The inference engine of AIAA server. Choose from AIAA or Triton. AIAA means to directly use the framework to do inference.

triton_ip

localhost

--triton_ip 10.18.2.13

Triton server ip

triton_port

8000

--triton_port 9000

Triton server service port based on the triton_proto

triton_proto

http

Protocol to communicate with Triton server. (http or grpc)

triton_shmem

no

--triton_shmem cuda

Whether to use shared memory communication between AIAA and Triton (no, system, or cuda)

triton_model_path

<workspace dir>/triton

Triton models path

triton_verbose

0

Set Triton verbose logging level. Zero (0) disables verbose logging and values >= 1 enables verbose logging.

Note

Native mode means that AIAA uses PyTorch to run the inference directly.

Workspace

AIAA Server uses the workspace directory for creating and managing MONAI Label app.

For more details on MONAI Label app, please refer: https://docs.monai.io/projects/label/en/latest/modules.html

Following are a few files/folders explained from the workspace directory. Note that INTERNAL means that the file or directory is managed by the AIAA server and we suggest not to change it.

File/Folder

Description

logs/

INTERNAL - AIAA server logs are stored over here

model/

INTERNAL - The models uploaded when running as AIAA backend are stored here

triton/

INTERNAL - All serving Triton models are stored here unless user provides a different triton_model_path while starting AIAA Server

lib/

EXTERNAL - Put your custom transforms/inferences/inference pipelines. They also represent MONAI Label components.

Note

We suggest you create a new folder to be the workspace so that if you want to delete these files you can just remove that workspace folder.

Note

The workspace is actually a MONAI Label app. That means, you can use it to run MONAI Label Server directly. For example: monailabel start_server --app workspace


Logs

Once the server is up and running, you can watch or pull the server logs in the browser.

Note

All the URLs with 0.0.0.0 or 127.0.0.1 is only accessible if your browser and server are on the same machine.

The $AIAA_PORT is the port specified in “docker-compose.env”.

You can use ip a to find your server’s IP address and check it remotely on http://[server_ip]:$AIAA_PORT/logs.


© Copyright 2021, NVIDIA. Last updated on Feb 2, 2023.