Frequently Asked Questions

Q1: Why does my model not show up in /v1/models?

Following the steps below to debug your model:

  1. Check your config.json file:

  • Check if it is valid JSON format:

    import json
    config = json.load(open('config.json', 'r'))

    This code should run without exception.

  1. Check your model file:

  1. Upload model to AIAA again:

Once you make sure all the pieces are correct, upload your model again to AIAA.

  1. Increase triton_model_timeout:

AIAA will poll Triton for this amount of time before AIAA claims the model is not imported correctly. If you are using the Triton engine, you can try using a larger timeout to ensure the model import success. (Modify the TRITON_MODEL_TIMEOUT in “docker-compose.env”)

  1. Check your logs:

If all the above steps do not work, start using flag --debug and check log files in <AIAA workspace>/logs. You can also go to Nvidia Developer Forums.


Currently, AIAA requires models to have a single input and a single output. Multi-class segmentation can be achieved by having multiple channels in output.

Q2: Why are the models returning bad results?

Most of the time, this is caused by the mismatch of data. Make sure your testing data in AIAA have the same characteristics as the data that you used to train your models.

That would include the following:

  1. Resolution/Spacing

  2. Orientation

  3. Contrast/Phase

For example, the pre-trained segmentation models on NGC are using data from Medical Segmentation Decathlon.

We re-scale the image to have a spacing of [1.0, 1.0, 1.0] and make sure the affine matrix of Nifti have all positive values.


MONAI provides some nice transforms to tackle the resolution and orientation problems.

Q3: Does AIAA support 2D models?

Yes, AIAA server supports 2D models. You can use HTTP requests to directly interacting with the AIAA server API.

Q4: What if my GPU card does not have enough memory?

If your GPU card is very tight on memory, you can do some of the following points to alleviate this:

  1. Load fewer models in the AIAA server

  2. Reduce roi (the size of scanning window) in config_aiaa.json

  3. Try to reduce your network size

Q5: Why can’t I start AIAA?

Make sure $AIAA_PORT is not used by other processes.

Q6: How can I start the AIAA server clean?

To start it all clean, remove the workspace folder and create a new one. Then start the AIAA server with the new workspace.

Q7: Why is AIAA occupying all the GPU memories when I am not running any inference?

When AIAA runs with Triton backend, it will put one model instance on every GPU that is visible inside the docker.

Users can modify the “device_ids” section under “deploy” section of “tritonserver” service to change the GPU id that you want to use.control what GPUs are visible. For the number of model instances on each GPU, users can modify gpu_instance_count under triton_model_config in their model configs.

When a model instance is loaded in GPU, even if it is not serving any inference requests at that moment, it will occupy some amount of GPU memory. As a result, if users want to free that GPU memory, they will have to either stop the AIAA server or unload some models (using DELETE model API).

Q8: Does AIAA use apache?

Yes. Advanced users can modify apache configs for AIAA which are located at /etc/apache2/ in the docker.

By default, it runs as nvidia user/group for security reasons.

Q9: Can I run multiple containers of AIAA in the same host?

Yes. But you have to make sure you are using different ports and they do not overlap.


Apache inside docker always runs at HTTP port 5000 and SSL port 5001

Q10: Creating Datasets for Clara Train with Clara AIAA?

You might be using AI-Assisted Annotation APIs along with rich manual annotation tools like 3D-Slicer/OHIF/MITK Workbench to create segmentation masks for new images (unlabeled data). Or you might be directly using REST APIs from AIAA to get the segmentation masks. At some point, if you want to use these new samples/masks to retrain a model for better performance. In such cases, please follow the below steps.

  • If you are using REST APIs you can save the segmentation mask (preferably NIFTI Format) from the HTTP response. AIAA provides additional request URI param output=image to save image only. Please visit$AIAA_PORT/docs/ for necessary details on API.

  • If you are using Annotation tools like MITK/3D-Slicer/OHIF Viewer, you can explore save options provided by the native tool to save the segmentation masks in NIFTI format.

Once you have saved enough new samples to train/finetune, you can create dataset.json required to feed to the training pipeline for Clara. For example, a typical dataset.json format is

  "training": [
      "image": "image_1.nii",
      "label": "label_1.nii"
  "validation": [
      "image": "image_2.nii",
      "label": "label_2.nii"

As you annotate more studies, you can continue to populate your dataset.json file. Once a significant portion has been created, you are ready to start training. Please refer to Working with Segmentation models for more details on data preparation and train/finetune options.


More discussions can be found in Nvidia Developer Forums