Working with Dynamic Shapes#

Dynamic Shapes is the ability to defer specifying some or all tensor dimensions until runtime. Dynamic shapes can be used through both the C++ and Python interfaces.

The following sections provide greater detail; however, here is an overview of the steps for building an engine with dynamic shapes:

Specify each runtime dimension of an input tensor by using -1 as a placeholder for the dimension.
Specify one or more optimization profiles at build time that specify the permitted range of dimensions for inputs with runtime dimensions and the dimensions for which the auto-tuner will optimize. For more information, refer to the Optimization Profiles section.
To use the engine:
1. Create an execution context from the engine, the same as without dynamic shapes.
2. Specify one of the optimization profiles from step 2 that covers the input dimensions.
3. Specify the input dimensions for the execution context. After setting input dimensions, you can get the output dimensions that TensorRT computes for the given input dimensions.
4. Enqueue work.

To change the runtime dimensions, repeat steps 3b and 3c, which do not have to be repeated until the input dimensions change.

Specifying Runtime Dimensions#

When building a network, use -1 to denote a runtime dimension for an input tensor. For example, to create a 3D input tensor named foo where the last two dimensions are specified at runtime, and the first dimension is fixed at build time, issue the following.

C++

networkDefinition.addInput("foo", DataType::kFLOAT, Dims3(3, -1, -1))

Python

network_definition.add_input("foo", trt.float32, (3, -1, -1))

After choosing an optimization profile, you must set the input dimensions at run time (refer to Optimization Profiles). Let the input have dimensions [3,150,250]. After setting an optimization profile for the previous example, you would call:

C++

context.setInputShape("foo", Dims{3, {3, 150, 250}})

Python

context.set_input_shape("foo", (3, 150, 250))

At runtime, asking the engine for binding dimensions returns the same dimensions used to build the network, meaning you get a -1 for each runtime dimension. For example:

C++

engine.getTensorShape("foo")

Returns a Dims with dimensions {3, -1, -1}.

Python

engine.get_tensor_shape("foo")

Returns (3, -1, -1).

To get the actual dimensions, which are specific to each execution context, query the execution context:

C++

context.getTensorShape("foo")

Returns a Dims with dimensions {3, 150, 250}.

Python

context.get_tensor_shape(0)

Returns (3, 150, 250).

The return value of setInputShape for input only indicates consistency for the optimization profile set for that input. After all input binding dimensions are specified, you can check whether the entire network is consistent with the dynamic input shapes by querying the dimensions of the output bindings of the network. Here is an example that retrieves the dimensions of an output named bar:

nvinfer1::Dims outDims = context->getTensorShape("bar");

if (outDims.nbDims == -1) {
    gLogError << "Invalid network output, this might be caused by inconsistent input shapes." << std::endl;
    // abort inference
}

If a dimension k is data-dependent, for example, it depends on the input of INonZeroLayer, outDims.d[k] will be -1. For more information on such outputs, refer to the Dynamically Shaped Output section.

Named Dimensions#

Both constant and runtime dimensions can be named. Naming dimensions provides two benefits:

For runtime dimensions, error messages use the dimension’s name. For example, if an input tensor foo has dimensions [n,10,m], it is more helpful to get an error message about m instead of (#2 (SHAPE foo)).
Dimensions with the same name are implicitly equal, which can help the optimizer generate a more efficient engine and diagnose mismatched dimensions at runtime. For example, suppose two inputs have dimensions [n,10,m] and [n,13]. In that case, the optimizer knows the lead dimensions are always equal, and accidentally using the engine with mismatched values for n will be reported as an error.

You can use the same name for constant and runtime dimensions as long as they are always equal.

The following syntax examples set the name of the third dimension of the tensor to m.

C++

tensor.setDimensionName(2, "m")

Python

tensor.set_dimension_name(2, "m")

There are corresponding methods to get a dimensions name:

C++

tensor.getDimensionName(2) // returns the name of the third dimension of the tensor, or nullptr if it does not have a name.

Python

tensor.get_dimension_name(2) # returns the name of the third dimension of the tensor, or None if it does not have a name.

When the input network is imported from an ONNX file, the ONNX parser automatically sets the dimension names using the names in the ONNX file. Therefore, if two dynamic dimensions are expected to be equal at runtime, specify the same name for these dimensions when exporting the ONNX file.

Dimension Constraint using `IAssertionLayer`#

Sometimes, two dynamic dimensions are not known to be equal statically but are guaranteed equal at runtime. Letting TensorRT know they are equal can help it build a more efficient engine. There are two ways to convey the equality constraint to TensorRT:

Give the dimensions the same name as described in the Named Dimensions section.
Use IAssertionLayer to express the constraint. This technique is more general since it can convey trickier equalities.

For example, if the first dimension of tensor A is guaranteed to be one more than the first dimension of tensor B, then the constraint can be established by:

C++

// Assumes A and B are ITensor* and n is a INetworkDefinition&.
auto shapeA = n.addShape(*A)->getOutput(0);
auto firstDimOfA = n.addSlice(*shapeA, Dims{1, {0}}, Dims{1, {1}}, Dims{1, {1}})->getOutput(0);
auto shapeB = n.addShape(*B)->getOutput(0);
auto firstDimOfB = n.addSlice(*shapeB, Dims{1, {0}}, Dims{1, {1}}, Dims{1, {1}})->getOutput(0);
static int32_t const oneStorage{1};
auto one = n.addConstant(Dims{1, {1}}, Weights{DataType::kINT32, &oneStorage, 1})->getOutput(0);
auto firstDimOfBPlus1 = n.addElementWise(*firstDimOfB, *one, ElementWiseOperation::kSUM)->getOutput(0);
auto areEqual = n.addElementWise(*firstDimOfA, *firstDimOfBPlus1, ElementWiseOperation::kEQUAL)->getOutput(0);
n.addAssertion(*areEqual, "oops");

Python

# Assumes `a` and `b` are ITensors and `n` is an INetworkDefinition
shape_a = n.add_shape(a).get_output(0)
first_dim_of_a = n.add_slice(shape_a, (0, ), (1, ), (1, )).get_output(0)
shape_b = n.add_shape(b).get_output(0)
first_dim_of_b = n.add_slice(shape_b, (0, ), (1, ), (1, )).get_output(0)
one = n.add_constant((1, ), np.ones((1, ), dtype=np.int32)).get_output(0)
first_dim_of_b_plus_1 = n.add_elementwise(first_dim_of_b, one, trt.ElementWiseOperation.SUM).get_output(0)
are_equal = n.add_elementwise(first_dim_of_a, first_dim_of_b_plus_1, trt.ElementWiseOperation.EQUAL).get_output(0)
n.add_assertion(are_equal, “oops”)

If the dimensions violate the assertion at runtime, TensorRT will throw an error.

Optimization Profiles#

An optimization profile describes a range of dimensions for each network input and the dimensions the auto-tuner will use for optimization. You must create at least one optimization profile at build time when using runtime dimensions. Two profiles can specify disjoint or overlapping ranges.

For example, one profile might specify a minimum size of [3,100,200], a maximum size of [3,200,300], and optimization dimensions of [3,150,250], while another profile might specify min, max, and optimization dimensions of [3,200,100], [3,300,400], and [3,250,250].

Note

The memory usage for different profiles can change dramatically based on the dimensions specified by the min, max, and opt parameters. Some operations have tactics that only work for MIN=OPT=MAX, so when these values differ, the tactic is disabled.

To create an optimization profile, first construct an IOptimizationProfile. Then, set the min, optimization, and max dimensions and add them to the network configuration. The shapes defined by the optimization profile must define valid input shapes for the network. Here are the calls for the first profile mentioned previously for an input foo:

C++

IOptimizationProfile* profile = builder.createOptimizationProfile();
profile->setDimensions("foo", OptProfileSelector::kMIN, Dims3(3,100,200);
profile->setDimensions("foo", OptProfileSelector::kOPT, Dims3(3,150,250);
profile->setDimensions("foo", OptProfileSelector::kMAX, Dims3(3,200,300);

config->addOptimizationProfile(profile)

Python

profile = builder.create_optimization_profile();
profile.set_shape("foo", (3, 100, 200), (3, 150, 250), (3, 200, 300))
config.add_optimization_profile(profile)

At runtime, you must set an optimization profile before setting input dimensions. Profiles are numbered in the order they were added, starting at 0. Note that each execution context must use a separate optimization profile.

To choose the first optimization profile in the example, use:

C++

context.setOptimizationProfileAsync(0, stream)

Python

context.set_optimization_profile_async(0, stream)

The provided stream argument should be the same CUDA stream that will be used for the subsequent enqueue(), enqueueV2(), or enqueueV3() invocation in this context. This ensures that the context executions happen after the optimization profile setup.

Suppose the associated CUDA engine has dynamic inputs. In that case, the optimization profile must be set at least once with a unique profile index that is not used by other execution contexts and that is not destroyed. For the first execution context created for an engine, profile 0 is implicitly chosen.

setOptimizationProfileAsync() can be called to switch between profiles. It must be called after any enqueue(), enqueueV2(), or enqueueV3() operations finish in the current context. When multiple execution contexts run concurrently, it can switch to a formerly used profile already released by another execution context with different dynamic input dimensions.

setOptimizationProfileAsync() function replaces the now deprecated version of the API setOptimizationProfile(). Using setOptimizationProfile() to switch between optimization profiles can cause GPU memory copy operations in the subsequent enqueue() or enqueueV2() operations. To avoid these calls during enqueue, use setOptimizationProfileAsync() API instead.

Dynamically Shaped Output#

If the output of a network has a dynamic shape, several strategies are available to allocate the output memory.

If the dimensions of the output are computable from the dimensions of inputs, use IExecutionContext::getTensorShape() to get the dimensions of the output after providing the dimensions of the input tensors and input shape tensors. Use the IExecutionContext::inferShapes() method to check if you forgot to supply the necessary information.

Otherwise, if the dimensions of the output are not computable in advance or you are calling enqueueV3, associate an IOutputAllocator with the output. More specifically:

Derive your allocator class from IOutputAllocator.

Override the reallocateOutput and notifyShape methods. TensorRT calls the first when it needs to allocate the output memory and the second when it knows the output dimensions. For example, the memory for the output of INonZeroLayer is allocated before the layer runs.

Here is an example derived class:

class MyOutputAllocator : nvinfer1::IOutputAllocator
{
public:
    void* reallocateOutput(
        char const* tensorName, void* currentMemory,
        uint64_t size, uint64_t alignment) override
    {
        // Allocate the output. Remember it for later use.
        outputPtr = /* depends on strategy, as discussed later …*/
        return outputPtr;
    }

    void notifyShape(char const* tensorName, Dims const& dims)
    {
        // Remember output dimensions for later use.
        outputDims = dims;
    }

    // Saved dimensions of the output
    Dims outputDims{};

    // nullptr if memory could not be allocated
    void* outputPtr{nullptr};
};

Here’s an example of how it might be used:

std::unordered_map<std::string, MyOutputAllocator> allocatorMap;

for (const char* name : /* names of outputs */)
{
    Dims extent = context->getTensorShape(name);
    void* ptr;
    if (engine->getTensorLocation(name) == TensorLocation::kDEVICE)
    {
        if (/* extent.d contains -1 */)
        {
            auto allocator = std::make_unique<MyOutputAllocator>();
            context->setOutputAllocator(name, allocator.get());
            allocatorMap.emplace(name, std::move(allocator));
        }
        else
        {
            ptr = /* allocate device memory per extent and format */
        }
    }
    else
    {
        ptr = /* allocate cpu memory per extent and format */
    }
    context->setTensorAddress(name, ptr);
}

Several strategies can be used for implementing reallocateOutput:

A

Defer allocation until the size is known. Do not call IExecutionContext::setTensorAddress, or call it with a nullptr for the tensor address.

B

Preallocate enough memory based on what IExecutionContext::getMaxOutputSize reports as an upper bound. This guarantees that the engine will not fail due to insufficient output memory, but the upper bound may be so high that it is useless.

C

If you have preallocated enough memory based on experience, use IExecutionContext::setTensorAddress to tell TensorRT about it. If the tensor does not fit, make reallocateOutput return nullptr, which will cause the engine to fail gracefully.

D

Preallocate memory as in C, but have reallocateOutput return a pointer to a bigger buffer if there is a fit problem. This increases the output buffer as needed.

E

Defer allocation until the size is known, like A. Then, attempt to recycle that allocation in subsequent calls until a bigger buffer is requested, and then increase it like in D.

Here’s an example derived class that implements E:

class FancyOutputAllocator : nvinfer1::IOutputAllocator
{
public:
    void reallocateOutput(
        char const* tensorName, void* currentMemory,
        uint64_t size, uint64_t alignment) override
    {
        if (size > outputSize)
        {
            // Need to reallocate
            cudaFree(outputPtr);
            outputPtr = nullptr;
            outputSize = 0;
            if (cudaMalloc(&outputPtr, size) == cudaSuccess)
            {
                outputSize = size;
            }
        }
        // If the cudaMalloc fails, outputPtr=nullptr, and engine
        // gracefully fails.
        return outputPtr;
    }

    void notifyShape(char const* tensorName, Dims const& dims)
    {
        // Remember output dimensions for later use.
        outputDims = dims;
    }

    // Saved dimensions of the output tensor
    Dims outputDims{};

    // nullptr if memory could not be allocated
    void* outputPtr{nullptr};

    // Size of allocation pointed to by output
    uint64_t outputSize{0};

    ~FancyOutputAllocator() override
    {
        cudaFree(outputPtr);
    }
};

TensorRT internally allocates memory asynchronously in the device’s current memory pool for networks with data-dependent shapes. Suppose the current device memory pool doesn’t have a release threshold set. In that case, performance degradation between runs may occur as the memory is returned to the operating system upon stream synchronization. In these cases, it’s recommended that you either provide the TensorRT runtime with a custom IGpuAllocator with a custom memory pool or experiment with setting the release threshold. More information about setting the release threshold can be found in Retaining Memory in the Pool and the Code Migration Guide.

Looking up Binding Indices for Multiple Optimization Profiles#

If you use enqueueV3 instead of the deprecated enqueueV2, you can skip this section because name-based methods such as IExecutionContext::setTensorAddress do not expect a profile suffix.

Each profile has separate binding indices in an engine built from multiple profiles. The names of I/O tensors for the K profile have [profile K] appended to them, with K written in decimal. For example, if the INetworkDefinition had the name foo, and bindingIndex refers to that tensor in the optimization profile with index 3, engine.getBindingName(bindingIndex) returns foo [profile 3].

Likewise, if using ICudaEngine::getBindingIndex(name) to get the index for a profile K beyond the first profile (K=0), append [profile K] to the name used in the INetworkDefinition. For example, if the tensor was called foo in the INetworkDefinition, engine.getBindingIndex("foo [profile 3]") returns the binding index of Tensor foo in optimization profile 3.

Always omit the suffix for K=0.

Bindings For Multiple Optimization Profiles#

This section explains the deprecated interface enqueueV2 and its binding indices. The newer interface enqueueV3 does away with binding indices.

Consider a network with four inputs, one output, and three optimization profiles in the IBuilderConfig. The engine has 15 bindings, five for each optimization profile, conceptually organized as a table:

Each row is a profile. Numbers in the table denote binding indices. The first profile has binding indices 0..4, the second has 5..9, and the third has 10..14.

The interfaces have an “auto-correct” in the scenario where the binding belongs to the first profile, but another profile was specified. TensorRT warns about the mistake in this case and then chooses the correct binding index from the same column.

Layer Extensions For Dynamic Shapes#

Some layers have optional inputs that allow specifying dynamic shape information; IShapeLayer can access a tensor’s shape at runtime. Furthermore, some layers allow for calculating new shapes. The next section goes into semantic details and restrictions. Here is a summary of what you might find useful in conjunction with dynamic shapes.

IShapeLayer outputs a 1D tensor containing the dimensions of the input tensor. For example, if the input tensor has dimensions [2,3,5,7], the output tensor is a four-element 1D tensor containing {2,3,5,7}. If the input tensor is a scalar, it has dimensions [], and the output tensor is a zero-element 1D tensor containing {}.

IResizeLayer accepts an optional second input containing the desired dimensions of the output.

IShuffleLayer accepts an optional second input containing the reshaped dimensions before the second transpose is applied. For example, the following network reshapes a tensor Y to have the same dimensions as X:

C++

auto* reshape = networkDefinition.addShuffle(Y);
reshape.setInput(1, networkDefintion.addShape(X)->getOutput(0));

Python

reshape = network_definition.add_shuffle(y)
reshape.set_input(1, network_definition.add_shape(X).get_output(0))

ISliceLayer accepts an optional second, third, and fourth input containing the start, size, and stride.

IConcatenationLayer, IElementWiseLayer, IGatherLayer, IIdentityLayer, and IReduceLayer can calculate shapes and create new shape tensors.

Restrictions For Dynamic Shapes#

The following layer restrictions arise because the layer’s weights have a fixed size:

IConvolutionLayer and IDeconvolutionLayer require that the channel dimension be a build time constant.
Int8 requires that the channel dimension be a build time constant.
Layers accepting additional shape inputs (IResizeLayer, IShuffleLayer, ISliceLayer) require that the additional shape inputs be compatible with the dimensions of the minimum and maximum optimization profiles as well as with the dimensions of the runtime data input; otherwise, it can lead to either a build time or runtime error.

Not all required build-time constants need to be set manually. TensorRT will infer shapes through the network layers, and only those that cannot be inferred to be build-time constants must be set manually.

For more information regarding layers, refer to the TensorRT Operator documentation.

Execution Tensors vs Shape Tensors#

TensorRT 8.5 largely erased the distinctions between execution tensors and shape tensors. However, when designing a network or analyzing performance, it may help to understand the internals and where internal synchronization is incurred.

Engines using dynamic shapes employ a ping-pong execution strategy.

Compute the shapes of tensors on the CPU until a shape requiring GPU results is reached.
Stream work to the GPU until you run out of work or reach an unknown shape. If the latter, synchronize and go back to step 1.

An execution tensor is a traditional TensorRT tensor. A shape tensor is a tensor that is related to shape calculations. It must have type Int32, Int64, Float, or Bool, its shape must be determinable at build time, and it must have no more than 64 elements. Refer to Shape Tensor I/O (Advanced) for additional restrictions for shape tensors at network I/O boundaries. For example, there is an IShapeLayer whose output is a 1D tensor containing the dimensions of the input tensor. The output is a shape tensor. IShuffleLayer accepts an optional second input that can specify reshaping dimensions. The second input must be a shape tensor.

When TensorRT needs a shape tensor, but the tensor has been classified as an execution tensor, the runtime copies the tensor from the GPU to the CPU, which incurs synchronization overhead.

Some layers are polymorphic in terms of the kinds of tensors they handle. For example, IElementWiseLayer can sum two INT32 execution tensors or two INT32 shape tensors. The type of the output tensor depends on its ultimate use. If the sum is used to reshape another tensor, it is a shape tensor.

Formal Inference Rules#

The formal inference rules used by TensorRT for classifying tensors are based on a type-inference algebra. Let E denote an execution tensor, and S denote a shape tensor.

IActivationLayer has the signature:

IActivationLayer: E → E

since it takes an execution tensor as an input and an execution tensor as an output. IElementWiseLayer is polymorphic in this respect, with two signatures:

IElementWiseLayer: S × S → S, E × E → E

For brevity, let us adopt the convention that t is a variable denoting either class of tensor, and all t in a signature refers to the same class of tensor. Then, the two previous signatures can be written as a single polymorphic signature:

IElementWiseLayer: t × t → t

The two-input IShuffleLayer has a shape tensor as the second input and is polymorphic concerning the first input:

IShuffleLayer (two inputs): t × S → t

IConstantLayer has no inputs but can produce a tensor of either kind, so its signature is:

IConstantLayer: → t

The signature for IShapeLayer allows all four possible combinations E→E, E→S, S→E, and S→S, so it can be written with two independent variables:

IShapeLayer: t1 → t2

Here is the complete set of rules, which also serves as a reference for which layers can be used to manipulate shape tensors:

IAssertionLayer: S →
IConcatenationLayer: t × t × ...→ t
ICumulativeLayer: t × t → t
IIfConditionalInputLayer: t → t
IIfConditionalOutputLayer: t → t
IConstantLayer: → t
IActivationLayer: t → t
IElementWiseLayer: t × t → t
IFillLayer: S → t
IFillLayer: S × t × t → t
IGatherLayer: t × t → t
IIdentityLayer: t → t
IReduceLayer: t → t
IResizeLayer (one input): E → E
IResizeLayer (two inputs): E × S → E
ISelectLayer: t × t × t → t
IShapeLayer: t1 → t2
IShuffleLayer (one input): t → t
IShuffleLayer (two inputs): t × S → t
ISliceLayer (one input): t → t
ISliceLayer (two inputs): t × S → t
ISliceLayer (three inputs): t × S × S → t
ISliceLayer (four inputs): t × S × S × S → t
IUnaryLayer: t → t
all other layers: E × ... → E × ...

The inferred types are not exclusive because the output can be the input of more than one subsequent layer. For example, an IConstantLayer might feed into one use that requires an execution tensor and another use that requires a shape tensor. The output of IConstantLayer is classified as both and can be used in both phase 1 and phase 2 of the two-phase execution.

The requirement that the size of a shape tensor be known at build time limits how ISliceLayer can be used to manipulate a shape tensor. Specifically, suppose the third parameter specifies the result’s size and is not a build-time constant. In that case, the length of the resulting tensor is unknown at build time, breaking the restriction that shape tensors have constant shapes. The slice will still work but will incur synchronization overhead at runtime because the tensor is considered an execution tensor that has to be copied back to the CPU to do further shape calculations.

The rank of any tensor has to be known at build time. For example, if the output of ISliceLayer is a 1D tensor of unknown length that is used as the reshape dimensions for IShuffleLayer, the output of the shuffle would have an unknown rank at build time, and hence such a composition is prohibited.

TensorRT’s inferences can be inspected using methods ITensor::isShapeTensor(), which returns true for a shape tensor, and ITensor::isExecutionTensor(), which returns true for an execution tensor. Build the entire network first before calling these methods because their answer can change depending on what uses of the tensor have been added.

For example, if a partially built network sums two tensors, T1 and T2, to create tensor T3, and none are yet needed as shape tensors, isShapeTensor() returns false for all three tensors. Setting the second input of IShuffleLayer to T3 would cause all three tensors to become shape tensors because IShuffleLayer requires its second optional input to be a shape tensor. If the output of IElementWiseLayer is a shape tensor, its inputs are, too.

Shape Tensor I/O (Advanced)#

Sometimes, the need arises to use a shape tensor as a network I/O tensor. For example, consider a network consisting solely of an IShuffleLayer. TensorRT infers that the second input is a shape tensor. ITensor::isShapeTensor returns true for it. Because it is an input shape tensor, TensorRT requires two things for it:

At build time: the optimization profile values of the shape tensor.
At run time: the values of the shape tensor.

The shape of an input shape tensor is always known at build time. The values must be described since they can be used to specify the dimensions of execution tensors.

The optimization profile values can be set using IOptimizationProfile::setShapeValues. Analogous to how min, max, and optimization dimensions must be supplied for execution tensors with runtime dimensions, min, max, and optimization values must be provided for shape tensors at build time.

The corresponding runtime method is IExecutionContext::setTensorAddress, which tells TensorRT where to look for the shape tensor values.

Because the inference of execution tensor versus shape tensor is based on ultimate use, TensorRT cannot infer whether a network output is a shape tensor. You must tell it using the method INetworkDefinition::markOutputForShapes.

Besides letting you output shape information for debugging, this feature is useful for composing engines. For example, consider building three engines, one each for sub-networks A, B, and C, where a connection from A to B or B to C might involve a shape tensor. Build the networks in reverse order: C, B, and A. After constructing network C, you can use ITensor::isShapeTensor to determine if an input is a shape tensor and use INetworkDefinition::markOutputForShapes to mark the corresponding output tensor in network B. Then check which inputs of B are shape tensors and mark the corresponding output tensor in network A.

Shape tensors at network boundaries must have the type Int32 or Int64. They cannot have type Float or Bool. A workaround for Bool is to use Int32 for the I/O tensor, with zeros and ones, and convert to/from Bool using IIdentityLayer.

At runtime, whether a tensor is an I/O shape tensor can be determined via ICudaEngine::isShapeInferenceIO().

INT8 Calibration with Dynamic Shapes#

A calibration optimization profile must be set to run INT8 calibration for a network with dynamic shapes. Calibration is performed using the profile’s kOPT values, and the calibration input data size must match this profile.

First, construct an IOptimizationProfile like a general optimization profile to create a calibration optimization profile. Then, set the profile to the configuration:

C++

config->setCalibrationProfile(profile)

Python

config.set_calibration_profile(profile)

The calibration profile must be valid or be nullptr. kMIN and kMAX values are overwritten by kOPT. To check the current calibration profile, use IBuilderConfig::getCalibrationProfile().

This method returns a pointer to the current calibration profile or nullptr if the calibration profile is unset. The getBatchSize() calibrator method must return 1 when running calibration for a dynamic-shaped network.

Note

If the calibration optimization profile is not set, the first network optimization profile is used as a calibration optimization profile.

Working with Dynamic Shapes#

Specifying Runtime Dimensions#

Named Dimensions#

Dimension Constraint using IAssertionLayer#

Optimization Profiles#

Dynamically Shaped Output#

Looking up Binding Indices for Multiple Optimization Profiles#

Bindings For Multiple Optimization Profiles#

Layer Extensions For Dynamic Shapes#

Restrictions For Dynamic Shapes#

Execution Tensors vs Shape Tensors#

Formal Inference Rules#

Shape Tensor I/O (Advanced)#

INT8 Calibration with Dynamic Shapes#

Dimension Constraint using `IAssertionLayer`#