model_config.proto

enum DataType
Data types supported for input and output tensors.
enumerator DataType::INVALID = 0
enumerator DataType::BOOL = 1
enumerator DataType::UINT8 = 2
enumerator DataType::UINT16 = 3
enumerator DataType::UINT32 = 4
enumerator DataType::UINT64 = 5
enumerator DataType::INT8 = 6
enumerator DataType::INT16 = 7
enumerator DataType::INT32 = 8
enumerator DataType::INT64 = 9
enumerator DataType::FP16 = 10
enumerator DataType::FP32 = 11
enumerator DataType::FP64 = 12
message ModelInstanceGroup
A group of one or more instances of a model and resources made available for those instances.
enum Kind
Kind of this instance group.
enumerator Kind::KIND_AUTO = 0

This instance group represents instances that can run on either CPU or GPU. If all GPUs listed in ‘gpus’ are available then instances will be created on GPU(s), otherwise instances will be created on CPU.

enumerator Kind::KIND_GPU = 1

This instance group represents instances that must run on the GPU.

enumerator Kind::KIND_CPU = 2

This instance group represents instances that must run on the CPU.

string name

Optional name of this group of instances. If not specified the name will be formed as <model name>_<group number>. The name of individual instances will be further formed by a unique instance number and GPU index:

Kind kind

The kind of this instance group. Default is KIND_AUTO. If KIND_AUTO or KIND_GPU then both ‘count’ and ‘gpu’ are valid and may be specified. If KIND_CPU only ‘count’ is valid and ‘gpu’ cannot be specified.

int32 count

For a group assigned to GPU, the number of instances created for each GPU listed in ‘gpus’. For a group assigned to CPU the number of instances created. Default is 1.

int32 gpus(repeated)

GPU(s) where instances should be available. For each GPU listed, ‘count’ instances of the model will be available. Setting ‘gpus’ to empty (or not specifying at all) is eqivalent to listing all available GPUs.

message ModelInput
An input required by the model.
enum Format
The format for the input.
enumerator Format::FORMAT_NONE = 0

The input has no specific format. This is the default.

enumerator Format::FORMAT_NHWC = 1

HWC image format. Tensors with this format require 3 dimensions if the model does not support batching (max_batch_size = 0) or 4 dimensions if the model does support batching (max_batch_size >= 1). In either case the ‘dims’ below should only specify the 3 non-batch dimensions (i.e. HWC or CHW).

enumerator Format::FORMAT_NCHW = 2

CHW image format. Tensors with this format require 3 dimensions if the model does not support batching (max_batch_size = 0) or 4 dimensions if the model does support batching (max_batch_size >= 1). In either case the ‘dims’ below should only specify the 3 non-batch dimensions (i.e. HWC or CHW).

string name

The name of the input.

DataType data_type

The data-type of the input.

Format format

The format of the input. Optional.

int64 dims(repeated)

The dimensions/shape of the input tensor.

message ModelOutput
An output produced by the model.
string name

The name of the output.

DataType data_type

The data-type of the output.

int64 dims(repeated)

The dimensions/shape of the output tensor.

string label_filename

The label file associated with this output. Should be specified only for outputs that represent classifications. Optional.

message ModelVersionPolicy
Policy indicating which versions of a model should be made available by the inference server.
message Latest
Serve only the latest version(s) of a model. This is the default policy.
uint32 num_versions

Serve only the ‘num_versions’ highest-numbered versions. T The default value of ‘num_versions’ is 1, indicating that by default only the single highest-number version of a model will be served.

message All

Serve all versions of the model.

message Specific
Serve only specific versions of the model.
int64 versions(repeated)

The specific versions of the model that will be served.

Latest latest

Serve only latest version(s) of the model.

All all

Serve all versions of the model.

Specific specific

Serve only specific version(s) of the model.

message ModelOptimizationPolicy
Optimization settings for a model. These settings control if/how a model is optimized and prioritized by the backend framework when it is loaded.
message Graph
Enable generic graph optimization of the model. If not specified the framework’s default level of optimization is used. Currently only supported for TensorFlow graphdef and savedmodel models and causes XLA to be enabled/disabled for the model.
int32 level

The optimization level. Defaults to 0 (zero) if not specified.

  • -1: Disabled
  • 0: Framework default
  • 1+: Enable optimization level (greater values indicate higher optimization levels)
enum ModelPriority
Model priorities. A model will be given scheduling and execution preference over models at lower priorities. Current model priorities only work for TensorRT models.
enumerator ModelPriority::PRIORITY_DEFAULT = 0

The default model priority.

enumerator ModelPriority::PRIORITY_MAX = 1

The maximum model priority.

enumerator ModelPriority::PRIORITY_MIN = 2

The minimum model priority.

Graph graph

The graph optimization setting for the model. Optional.

ModelPriority priority

The priority setting for the model. Optional.

message ModelDynamicBatching
Dynamic batching configuration. These settings control if/how dynamic batching operates for the model.
int32 preferred_batch_size(repeated)

Preferred batch sizes for dynamic batching. If a batch of one of these sizes can be formed it will be executed immediately. If not specified a preferred batch size will be chosen automatically based on model and GPU characteristics.

int32 max_queue_delay_microseconds

The maximum time, in microseconds, a request will be delayed in the scheduling queue to wait for additional requests for batching. Default is 0.

message ModelConfig
A model configuration.
string name

The name of the model.

string platform

The framework for the model. Possible values are “tensorrt_plan”, “tensorflow_graphdef”, “tensorflow_savedmodel”, and “caffe2_netdef”.

ModelVersionPolicy version_policy

Policy indicating which version(s) of the model will be served.

int32 max_batch_size

Maximum batch size allowed for inference. This can only decrease what is allowed by the model itself. A max_batch_size value of 0 indicates that batching is not allowed for the model and the dimension/shape of the input and output tensors must exactly match what is specified in the input and output configuration. A max_batch_size value > 0 indicates that batching is allowed and so the model expects the input tensors to have an additional initial dimension for the batching that is not specified in the input (for example, if the model supports batched inputs of 2-dimensional tensors then the model configuration will specify the input shape as [ X, Y ] but the model will expect the actual input tensors to have shape [ N, X, Y ]). For max_batch_size > 0 returned outputs will also have an additional initial dimension for the batch.

ModelInput input(repeated)

The inputs request by the model.

ModelOutput output(repeated)

The outputs produced by the model.

ModelOptimizationPolicy optimization

Optimization configuration for the model. If not specified then default optimization policy is used.

ModelDynamicBatching dynamic_batching

Dynamic batching configuration for the model. If not specified then dynamic batching is disabled for the model.

ModelInstanceGroup instance_group(repeated)

Instances of this model. If not specified, one instance of the model will be instantiated on each available GPU.

string default_model_filename

Optional filename of the model file to use if a compute-capability specific model is not specified in cc_model_names. If not specified the default name is ‘model.graphdef’, ‘model.savedmodel’, ‘model.plan’ or ‘model.netdef’ depending on the model type.

map<string, string> cc_model_filenames

Optional map from CUDA compute capability to the filename of the model that supports that compute capability. The filename refers to a file within the model version directory.

map<string, string> tags

Optional model tags. User-specific key-value pairs for this model. These tags are applied to the metrics reported on the HTTP metrics port.