Model Repository¶
The TensorRT Inference Server accesses models from a locally accessible file path or from Google Cloud Storage. This path is specified when the server is started using the --model-store option.
For a locally accessible file-system the absolute path must be specified, for example, --model-store=/path/to/model/repository. For a model repository residing in Google Cloud Storage, the path must be prefixed with gs://, for example, --model-store=gs://bucket/path/to/model/repository.
Example Model Repository describes how to create an example repository with a couple if image classification models.
An example of a typical model repository layout is shown below:
<model-repository-path>/
  model_0/
    config.pbtxt
    output0_labels.txt
    1/
      model.plan
    2/
      model.plan
  model_1/
    config.pbtxt
    output0_labels.txt
    output1_labels.txt
    0/
      model.graphdef
    7/
      model.graphdef
Any number of models may be specified and the inference server will attempt to load all models into the CPU and GPU when the server starts. The Status API can be used to determine if any models failed to load successfully. The server’s console log will also show the reason for any failures during startup.
The name of the model directory (model_0 and model_1 in the above example) must match the name of the model specified in the model configuration file, config.pbtxt. The model name is used in the client API and server API to identify the model. Each model directory must have at least one numeric subdirectory. Each of these subdirectories holds a version of the model with the version number corresponding to the directory name.
For more information about how the model versions are handled by the server see Model Versions. Within each version subdirectory there are one or more model definition files that specify the actual model. The model definition can be either a framework-specific model file or a shared library implementing a custom backend.
The *_labels.txt files are optional and are used to provide labels for
outputs that represent classifications. The label file must be
specified in the label_filename property of
the output it corresponds to in the model configuration.
Modifying the Model Repository¶
By default, changes to the model repository will be detected and the server will attempt to add, remove, and reload models as necessary based on those changes. Changes to the model repository may not be detected immediately because the server polls the repository periodically. You can control the polling interval with the --repository-poll-secs options. The console log or the Status API can be used to determine when model repository changes have taken effect. You can disable the server from responding to repository changes by using the --allow-poll-model-repository=false option.
The TensorRT Inference Server responds to the following changes:
- Versions may be added and removed from models by adding and removing the corresponding version subdirectory. The inference server will allow in-flight requests to complete even if they are using a removed version of the model. New requests for a removed model version will fail. Depending on the model’s version policy, changes to the available versions may change which model version is served by default. 
- Existing models can be removed from the repository by removing the corresponding model directory. The inference server will allow in-flight requests to any version of the removed model to complete. New requests for a removed model will fail. 
- New models can be added to the repository by adding a new model directory. 
- The model configuration (config.pbtxt) can be changed and the server will unload and reload the model to pick up the new model configuration. 
- Labels files providing labels for outputs that represent classifications can be added, removed, or modified and the inference server will unload and reload the model to pick up the new labels. If a label file is added or removed the corresponding edit to the - label_filenameproperty of the output it corresponds to in the model configuration must be performed at the same time.
Model Versions¶
Each model can have one or more versions available in the model repository. Each version is stored in its own, numerically named, subdirectory where the name of the subdirectory corresponds to the version number of the model. Each model specifies a version policy that controls which of the versions in the model repository are made available by the server at any given time.
Framework Model Definition¶
Each model version subdirectory must contain at least one model definition. By default, the name of this file or directory must be:
- model.plan for TensorRT models 
- model.graphdef for TensorFlow GraphDef models 
- model.savedmodel for TensorFlow SavedModel models 
- model.netdef and init_model.netdef for Caffe2 Netdef models 
This default name can be overridden using the default_model_filename property in the model configuration.
Optionally, a model can provide multiple model definition files, each targeted at a GPU with a different Compute Capability. Most commonly, this feature is needed for TensorRT and TensorFlow/TensorRT integrated models where the model definition is valid for only a single compute capability. See the cc_model_filenames property in the model configuration for description of how to specify different model definitions for different compute capabilities.
TensorRT Models¶
A TensorRT model definition is called a Plan. A TensorRT Plan is a single file that by default must be named model.plan. A TensorRT Plan is specific to CUDA Compute Capability and so it is typically necessary to use the model configuration’s cc_model_filenames property as described above.
A minimal model repository for a single TensorRT model would look like:
models/
  <model-name>/
    config.pbtxt
    1/
      model.plan
As described in Generated Model Configuration the config.pbtxt is optional for some models. In cases where it is not required the minimal model repository would look like:
models/
  <model-name>/
    1/
      model.plan
TensorFlow Models¶
TensorFlow saves trained models in one of two ways: GraphDef or SavedModel. The inference server supports both formats. Once you have a trained model in TensorFlow, you can save it as a GraphDef directly or convert it to a GraphDef by using a script like freeze_graph.py, or save it as a SavedModel using a SavedModelBuilder or tf.saved_model.simple_save.
A TensorFlow GraphDef is a single file that by default must be named model.graphdef. A minimal model repository for a single TensorFlow GraphDef model would look like:
models/
  <model-name>/
    config.pbtxt
    1/
      model.graphdef
A TensorFlow SavedModel is a directory containing multiple files. By default the directory must be named model.savedmodel. A minimal model repository for a single TensorFlow SavedModel model would look like:
models/
  <model-name>/
    config.pbtxt
    1/
      model.savedmodel/
         <saved-model files>
As described in Generated Model Configuration the config.pbtxt is optional for some models. In cases where it is not required the minimal model repository would look like:
models/
  <model-name>/
    1/
      model.savedmodel/
         <saved-model files>
Caffe2 Models¶
A Caffe2 model definition is called a NetDef. A Caffe2 NetDef is a single file that by default must be named model.netdef. A minimal model repository for a single NetDef model would look like:
models/
  <model-name>/
    config.pbtxt
    1/
      model.netdef
TensorRT/TensorFlow Models¶
TensorFlow 1.7 and later integrates TensorRT to enable TensorFlow models to benefit from the inference optimizations provided by TensorRT. The inference server supports models that have been optimized with TensorRT and can serve those models just like any other TensorFlow model. The inference server’s TensorRT version (available in the Release Notes) must match the TensorRT version that was used when the model was created.
A TensorRT/TensorFlow integrated model is specific to CUDA Compute Capability and so it is typically necessary to use the model configuration’s cc_model_filenames property as described above.
ONNX Models¶
The TensorRT Inference Server cannot directly perform inferencing using ONNX models. An ONNX model must be converted to either a TensorRT Plan or a Caffe2 NetDef. To convert your ONNX model to a TensorRT Plan use either the ONNX Parser included in TensorRT or the open-source TensorRT backend for ONNX. Another option is to convert your ONNX model to Caffe2 NetDef as described here.
Custom Backends¶
A model using a custom backend is represented in the model repository in the same way as models using a deep-learning framework backend. Each model version subdirectory must contain at least one shared library that implements the custom model backend. By default, the name of this shared library must be libcustom.so but the default name can be overridden using the default_model_filename property in the model configuration.
Optionally, a model can provide multiple shared libraries, each targeted at a GPU with a different Compute Capability. See the cc_model_filenames property in the model configuration for description of how to specify different shared libraries for different compute capabilities.
Custom Backend API¶
A custom backend must implement the C interface defined in custom.h. The interface is also documented in the API Reference.
Example Custom Backend¶
An example of a custom backend can be found in the addsub backend. You can see the custom backend being used as part of CI testing in L0_infer.