Pooling

Computes a per-channel pooling using a sampling window on the input tensor into an output tensor. The supported sampling window shapes are 2-D or 3-D.

Attributes

pooling_type Pooling operation can be one of:

  • MAX For each output element, return the maximum value found in its corresponding sampling window.

  • AVERAGE For each output element, return the average of the values in its corresponding sampling window.

  • MAX_AVERAGE_BLEND For each output element, return the weighted sum of MAX and AVG pooling, where blend_factor is the blending factor. \(pooling(\text{MAX_AVERAGE_BLEND})=(1-\text{blend_factor}) \cdot pooling(MAX) + \text{blend_factor} \cdot pooling(AVERAGE)\).

blend_factor A parameter used when the pooling type is set to MAX_AVERAGE_BLEND.

padding_mode The padding mode. The padding mode can be one of the following:

\[\begin{split}\begin{gather} I = \text{dimensions of input image.} \\ B = \text{pre-padding, before the image data. For deconvolution, pre-padding is set before output.} \\ A = \text{post-padding, after the image data. For deconvolution, post-padding is set after output.} \\ P = \text{delta between input and output} \\ S = \text{stride} \\ F = \text{filter} \\ O = \text{output} \\ D = \text{dilation. For Pooling layers, always equals 1} \\ M = I + B + A\text{The data plus any padding} \\ DK = 1 + D \cdot (F - 1) \\ \end{gather}\end{split}\]
  • EXPLICIT_ROUND_DOWN Use explicit padding, rounding the output size down.
    \(O = \lfloor\frac{M - DK}{S}\rfloor + 1\)
  • EXPLICIT_ROUND_UP Use explicit padding, rounding the output size up.
    \(O = \lceil\frac{M - DK}{S}\rceil + 1\)
  • SAME_UPPER Use SAME padding, with \(\text{pre-padding} \leq \text{post-padding}\).
    \(\begin{gather}O = \lceil\frac{I}{S}\rceil \\ P = \lfloor\frac{I-1}{S}\rfloor \cdot S + DK -I \\ B = \lfloor\frac{P}{2}\rfloor \\ A = P - B \end{gather}\)
  • SAME_LOWER Use SAME padding, with \(\text{pre-padding} \geq \text{post-padding}\).
    \(\begin{gather}O = \lceil\frac{I}{S}\rceil \\ P = \lfloor\frac{I-1}{S}\rfloor \cdot S + DK -I \\ A = \lfloor\frac{P}{2}\rfloor \\ B = P - A \end{gather}\)

average_count_excludes_padding When setting this parameter, the average pooling calculation ignores the padded input.

Inputs

input: tensor of type T

Outputs

output: tensor of type T

Data Types

T: int8, float16, float32

Shape Information

Input tensor must be a tensor with rank \(r\geq3\).

Output tensor rank is same as the input tensor rank. If the input’s shape is \([a_0,...,a_n]\), the stride is \(s\), the padding is \(p\), and the sampling window shape is \([r_0,..,r_m]\) where \(m=2\) or \(m=3\):

\[\begin{split}b_i = \begin{cases} a_i &\mbox{if } i \in [0,n-m) \\ \frac{a_i + 2 \cdot p_{m+i-n} + r_{m+i-n}}{s_{m+i-n}} &\mbox{else} \\ \end{cases}\end{split}\]

Volume Limits

input and output can have up to \(2^{31}\) elements.

Supporting Formats

As of TRT 10.5, TRT only supports FP32/FP16 NCHW kernels for 3D pooling.

DLA Support

DLA FP16 and DLA INT8 are supported for 2D pooling for max pooling, and for the inclusive padding mode of average pooling.

Examples

Pooling
in1 = network.add_input("input1", dtype=trt.float32, shape=(1, 1, 5, 5))
layer = network.add_pooling_nd(in1, trt.PoolingType.MAX, trt.tensorrt.DimsHW(3, 3))
network.mark_output(layer.get_output(0))

inputs[in1.name] = np.array(
    [
        [
            [
                [-10.0, -9.0, -8.0, -7.0, -6.0],
                [-5.0, -4.0, -3.0, -2.0, -1.0],
                [0.0, 1.0, 2.0, 3.0, 4.0],
                [5.0, 6.0, 7.0, 8.0, 9.0],
                [10.0, 11.0, 12.0, 13.0, 14.0],
            ]
        ]
    ]
)
np.reshape(np.arange(-10, 15, dtype=np.float32), newshape=(1, 1, 5, 5))

outputs[layer.get_output(0).name] = layer.get_output(0).shape

expected[layer.get_output(0).name] = np.array([[[[2.0, 3.0, 4.0], [7.0, 8.0, 9.0], [12.0, 13.0, 14.0]]]])
in1 = network.add_input("input1", dtype=trt.float32, shape=(1, 1, 5, 5))
layer = network.add_pooling_nd(in1, trt.PoolingType.AVERAGE, trt.tensorrt.DimsHW(3, 3))
layer.post_padding = (1, 1)
layer.pre_padding = (1, 1)
layer.average_count_excludes_padding = True
network.mark_output(layer.get_output(0))

inputs[in1.name] = np.array(
    [
        [
            [
                [-10.0, -9.0, -8.0, -7.0, -6.0],
                [-5.0, -4.0, -3.0, -2.0, -1.0],
                [0.0, 1.0, 2.0, 3.0, 4.0],
                [5.0, 6.0, 7.0, 8.0, 9.0],
                [10.0, 11.0, 12.0, 13.0, 14.0],
            ]
        ]
    ]
)
np.reshape(np.arange(-10, 15, dtype=np.float32), newshape=(1, 1, 5, 5))

outputs[layer.get_output(0).name] = layer.get_output(0).shape

expected[layer.get_output(0).name] = np.array(
    [
        [
            [
                [-7.0, -6.5, -5.5, -4.5, -4.0],
                [-4.5, -4.0, -3.0, -2.0, -1.5],
                [0.5, 1.0, 2.0, 3.0, 3.5],
                [5.5, 6.0, 7.0, 8.0, 8.5],
                [8.0, 8.5, 9.5, 10.5, 11.0],
            ]
        ]
    ]
)

C++ API

For more information about the C++ IPoolingLayer operator, refer to the C++ IPoolingLayer documentation.

Python API

For more information about the Python IPoolingLayer operator, refer to the Python IPoolingLayer documentation.