Frequently Asked Questions

Q1: Why does my model not show up in /v1/models?

Following the steps below to debug your model:

  1. Check your config.json file:

  • Check if it is valid JSON format:

    import json
    config = json.load(open('config.json', 'r'))
    

    This code should run without exception.

  • If you are using custom transforms, make sure you follow the instructions in Bring your own Transforms.

  1. Check your model file:

  1. Upload model to AIAA again:

Once you make sure all the pieces are correct, upload your model again to AIAA.

  1. Increase triton_model_timeout:

AIAA will poll Triton for this amount of time before AIAA claims the model is not imported correctly. If you are using Triton engine (which is the default case), you can try using a larger timeout to ensure the model import success. e.g.: start_aas.sh --triton_model_timeout 120.

  1. Check your logs:

If all the above steps do not work, start using flag --debug 1 and check log files in /workspace/logs. You can also go to Nvidia Developer Forums.

Note

Currently, AIAA requires models to have a single input and a single output. Multi-class segmentation can be achieved by having multiple channels in output.

Q2: Why are the models returning bad results?

Most of the time, this is caused by the mismatch of data. Make sure your testing data in AIAA have the same characteristics as the data that you used to train your models.

That would include the following:

  1. Resolution/Spacing

  2. Orientation

  3. Contrast/Phase

For example, the pre-trained segmentation models on NGC are using data from Medical Segmentation Decathlon.

We re-scale the image to have a spacing of [1.0, 1.0, 1.0] and make sure the affine matrix of Nifti have all positive values.

Hint

Clara Train API provides some nice transforms to tackle the resolution and orientation problems.

Tip

If you trained your model with data augmentation like RandomAxisFlip and RandomZoom then it will be insensitive to orientation.

Q3: Does AIAA support 2D models?

Yes, we do support 2D models. However, this is only supported by directly interacting with the AIAA server API via HTTP post requests. (Please refer to Tutorial: Brain Segmentation PyTorch for an example.)

We are planning to support 2D models in other clients in the future.

Q4: What if my GPU card does not have enough memory?

If your GPU card is very tight on memory, you can do some of the following points to alleviate this:

  1. Load fewer models in AIAA server

  2. Reduce roi (the size of scanning window) in config_aiaa.json

  3. Try to reduce your network size

Q5: Why can’t I start AIAA?

If you start the docker using --net=host, make sure AIAA port and Triton ports are not used by other processes.

If you use -p [host port]:[docker port] to run docker, then just make sure the [host port] is not used by other processes.

Q6: How can I start the AIAA server clean?

To start it all clean, remove the workspace folder and create a new one. Then start AIAA server with the new workspace.

Q7: What is the format of data that AIAA expect?

You can provide your own data loader to load data in any format you want (png, jpg, NumPy array). Please refer to Bring your own Data Loader.

Notice that AIAA currently does not support batching, it supports inference on one image/volume for each request. So you need to make sure the ShapeFormat in the end of your pre-transforms chain should not have the batch dimension (“N”). If you are writing custom transforms, make sure you take care of ShapeFormat.

Q8: Why is AIAA occupying all the GPU memories when I am not running any inference?

When AIAA runs with Triton backend, it will put one model instance on every GPU that is visible inside the docker.

Users can use -e NVIDIA_VISIBLE_DEVICES=[ids of the GPU you want to use] to control what GPUs are visible. For the number of model instances on each GPU, users can modify gpu_instance_count under triton in their model configs.

When a model instance is loaded in GPU, even if it is not serving any inference requests at that moment, it will occupy some amount of GPU memory. As a result, if users want to free that GPU memory, they will have to either stop the AIAA server or unload some models (using DELETE model API).

Q9: Does AIAA use apache?

Yes. Advanced users can modify apache configs for AIAA which are normally located at /opt/nvidia/medical/nvmidl/apps/aas/www/conf/ in the docker.

By default, it runs as www-data user/group for security reasons. Hence the ownership of AIAA workspace will get modified accordingly.

Q10: Can I run multiple containers of AIAA in the same host?

Yes. But you have to make sure you are using different ports for Triton and they do not overlap. In such cases avoid using --net=host and use direct port mapping to make sure AIAA port and Triton ports are not used by other processes.

You use -p [host port]:[docker port] to run docker and make sure [host port] is not used by other processes. For example -p 9000:80 to map a different host port for HTTP access and -p 9001:443 for HTTPS.

Then you can try

  • curl http://127.0.0.1:9000/v1/models

  • curl --insecure https://127.0.0.1:9001/v1/models (if you are running AIAA in ssl mode)

Also recommended to use a different port for Triton server while starting AIAA. For example: start_aiaa.sh --triton_port 8500

Note

Apache inside docker always runs at HTTP port 80 and SSL port 443

Hint

More discussions can be found in Nvidia Developer Forums

Q11: Can I run AIAA as a non root user?

If you are running AIAA as a non-root user, the HTTP port will be 5000 (instead of 80).

To start container as a non-root user (make sure non-root user name + group name is valid inside the container)

docker run -it --rm --gpus=2 -p 5680:5000 \
           -u [user name]:[user group] -v /etc/passwd:/etc/passwd -v /etc/group:/etc/group \
           -v /home/xyz:/workspace/ \
           nvcr.io/nvidia/clara-train-sdk:<version here> \
           /bin/bash

After that you can run AIAA as a non-root user: start_aas.sh --workspace /workspace

Note

Default workspace (/var/nvidia/aiaa) will not work for non-root user as it might not have required permissions. So always specifying your own workspace with possible permissions set for non-root user.

Note

Note the difference between root and non-root use case. Root user (existing method):

docker run -it --rm --gpus=2 -p 5678:80 -p 5679:443 \
           -v /home/xyz:/workspace/ \
           nvcr.io/nvidia/clara-train-sdk:<version here> \
           /bin/bash

Q12: How to run AIAA using singularity?

First we convert the Clara-Train docker image to singularity.

singularity build clara-train-sdk.simg docker://nvcr.io/nvidia/clara-train-sdk:<version here>

We execute that image using the following commands.

singularity exec --nv clara-train-sdk.simg /bin/bash

Then we use the following commands to launch the AIAA server:

export TZ="UTC"
start_aas.sh --workspace [somewhere that belongs to the user]

The AIAA server will be up and running at http://0.0.0.0:5000