Model Repository

The TensorRT Inference Server accesses models from one or more locally accessible file paths, from Google Cloud Storage, and from Amazon S3. These paths are specified when the server is started using the --model-repository option.

For a locally accessible file-system the absolute path must be specified, for example, --model-repository=/path/to/model/repository. For a model repository residing in Google Cloud Storage, the path must be prefixed with gs://, for example, --model-repository=gs://bucket/path/to/model/repository. For a model repository residing in Amazon S3, the path must be prefixed with s3://, for example, --model-repository=s3://bucket/path/to/model/repository. When using a local instance of S3, the prefix s3:// must be followed by the host and port (separated by a semicolon) and subsequently the bucket path. For example, --model-repository=s3://host:port/bucket/path/to/model/repository.

Example Model Repository describes how to create an example repository with a couple of image classification models.

An example of a typical model repository layout is shown below:

<model-repository-path>/
  model_0/
    config.pbtxt
    output0_labels.txt
    1/
      model.plan
    2/
      model.plan
  model_1/
    config.pbtxt
    output0_labels.txt
    output1_labels.txt
    0/
      model.graphdef
    7/
      model.graphdef

See Model Management for discussion of how the inference server manages the models specified in the model repositories. The Status API can be used to determine if any models failed to load successfully. The server’s console log will also show the reason for any failures during startup.

The name of the model directory (model_0 and model_1 in the above example) must match the name of the model specified in the model configuration file, config.pbtxt. The model name is used in the client API and server API to identify the model. Each model directory must have at least one numeric subdirectory. Each of these subdirectories holds a version of the model with the version number corresponding to the directory name.

For more information about how the model versions are handled by the server see Model Versions. Within each version subdirectory there are one or more model definition files that specify the actual model, except for ensemble models. The model definition can be either a framework-specific model file or a shared library implementing a custom backend.

The *_labels.txt files are optional and are used to provide labels for outputs that represent classifications. The label file must be specified in the label_filename property of the output it corresponds to in the model configuration.

Modifying the Model Repository

The inference server has multiple execution modes that control how the models within the model repository are managed. These modes are described in Model Management.

Model Versions

Each model can have one or more versions available in the model repository. Each version is stored in its own, numerically named, subdirectory where the name of the subdirectory corresponds to the version number of the model. The subdirectories that are not numerically named, or that have zero prefix will be ignored. Each model specifies a version policy that controls which of the versions in the model repository are made available by the server at any given time.

Framework Model Definition

Each model version subdirectory must contain at least one model definition. By default, the name of this file or directory must be:

  • model.plan for TensorRT models

  • model.graphdef for TensorFlow GraphDef models

  • model.savedmodel for TensorFlow SavedModel models

  • model.onnx for ONNX Runtime ONNX models

  • model.pt for PyTorch TorchScript models

  • model.netdef and init_model.netdef for Caffe2 Netdef models

This default name can be overridden using the default_model_filename property in the model configuration.

Optionally, a model can provide multiple model definition files, each targeted at a GPU with a different Compute Capability. Most commonly, this feature is needed for TensorRT and TensorFlow/TensorRT integrated models where the model definition is valid for only a single compute capability. See the cc_model_filenames property in the model configuration for description of how to specify different model definitions for different compute capabilities.

TensorRT Models

A TensorRT model definition is called a Plan. A TensorRT Plan is a single file that by default must be named model.plan. A TensorRT Plan is specific to CUDA Compute Capability and so it is typically necessary to use the model configuration’s cc_model_filenames property as described above.

A minimal model repository for a single TensorRT model would look like:

models/
  <model-name>/
    config.pbtxt
    1/
      model.plan

As described in Generated Model Configuration the config.pbtxt is optional for some models. In cases where it is not required the minimal model repository would look like:

models/
  <model-name>/
    1/
      model.plan

TensorFlow Models

TensorFlow saves trained models in one of two ways: GraphDef or SavedModel. The inference server supports both formats. Once you have a trained model in TensorFlow, you can save it as a GraphDef directly or convert it to a GraphDef by using a script like freeze_graph.py, or save it as a SavedModel using a SavedModelBuilder or tf.saved_model.simple_save. If you use the Estimator API you can also use Estimator.export_savedmodel.

A TensorFlow GraphDef is a single file that by default must be named model.graphdef. A minimal model repository for a single TensorFlow GraphDef model would look like:

models/
  <model-name>/
    config.pbtxt
    1/
      model.graphdef

A TensorFlow SavedModel is a directory containing multiple files. By default the directory must be named model.savedmodel. A minimal model repository for a single TensorFlow SavedModel model would look like:

models/
  <model-name>/
    config.pbtxt
    1/
      model.savedmodel/
         <saved-model files>

As described in Generated Model Configuration the config.pbtxt is optional for some models. In cases where it is not required the minimal model repository would look like:

models/
  <model-name>/
    1/
      model.savedmodel/
         <saved-model files>

TensorRT/TensorFlow Models

TensorFlow 1.7 and later integrates TensorRT to enable TensorFlow models to benefit from the inference optimizations provided by TensorRT. The inference server supports models that have been optimized with TensorRT and can serve those models just like any other TensorFlow model. The inference server’s TensorRT version (available in the Release Notes) must match the TensorRT version that was used when the model was created.

A TensorRT/TensorFlow integrated model is specific to CUDA Compute Capability and so it is typically necessary to use the model configuration’s cc_model_filenames property as described above.

As an alternative to creating a TensorRT/TensorFlow model offline it is possible to use model configuration settings to have the TensorRT optimization performed dynamically, when the model is first loaded or in response to inference requests. See TensorRT Optimization for more information.

ONNX Models

An ONNX model is a single file that by default must be named model.onnx. Notice that some ONNX models may not be supported by the inference server as they are not supported by the underlying ONNX Runtime (due to either using stale ONNX opset version or containing operators with unsupported types).

By default the ONNX Runtime uses a default execution provider when running models. For execution of models on CPU this default execution provider does not utilize MKL-DNN. The model configuration Optimization Policy allows you to select the OpenVino execution provider for CPU execution of a model instead of the default execution provider. For execution of models on GPU the default CUDA execution provider uses CuDNN to accelerate inference. The model configuration Optimization Policy allows you to select the tensorrt execution provider for GPU which causes the ONNX Runtime to use TensorRT to accelerate all or part of the model. See TensorRT Optimization for more information on the tensorrt execution provider.

A minimal model repository for a single ONNX model would look like:

models/
  <model-name>/
    config.pbtxt
    1/
      model.onnx

As described in Generated Model Configuration the config.pbtxt is optional for some models. In cases where it is not required the minimal model repository would look like:

models/
  <model-name>/
    1/
      model.onnx

PyTorch Models

An PyTorch model is a single file that by default must be named model.pt. Notice that a PyTorch model must be traced with an example input and saved as a TorchScript Module as shown here. It is possible that some models traced with different versions of PyTorch may not be supported by the inference server due to changes in the underlying opset. A minimal model repository for a single PyTorch model would look like:

models/
  <model-name>/
    config.pbtxt
    1/
      model.pt

Caffe2 Models

A Caffe2 model definition is called a NetDef. A Caffe2 NetDef is a single file that by default must be named model.netdef. A minimal model repository for a single NetDef model would look like:

models/
  <model-name>/
    config.pbtxt
    1/
      model.netdef

Custom Backends

A model using a custom backend is represented in the model repository in the same way as models using a deep-learning framework backend. Each model version subdirectory must contain at least one shared library that implements the custom model backend. By default, the name of this shared library must be libcustom.so but the default name can be overridden using the default_model_filename property in the model configuration.

Optionally, a model can provide multiple shared libraries, each targeted at a GPU with a different Compute Capability. See the cc_model_filenames property in the model configuration for description of how to specify different shared libraries for different compute capabilities.

Currently, only model repositories on the local filesystem support custom backends. A custom backend contained in a model repository in cloud storage (for example, a repository accessed with the gs:// prefix or s3:// prefix as described above) cannot be loaded by the inference server.

Custom Backend API

A custom backend must implement the C interface defined in custom.h. The interface is also documented in the API Reference.

Example Custom Backend

Several example custom backends can be found in the src/custom directory. For more information on building your own custom backends as well as a simple example you can build yourself, see Building A Custom Backend.

Ensemble Backends

A model using an ensemble backend is represented in the model repository in the same way as models using a deep-learning framework backend. Currently, the ensemble backend does not require any version specific data, so each model version subdirectory must exist but should be empty.

An example of an ensemble backend in a model repository can be found in the docs/examples/ensemble_model_repository/preprocess_resnet50_ensemble directory.