NVIDIA Performance Primitives (NPP)  Version 10.1
Imaging-Processing Specific API Conventions

Function Naming

Image processing related functions use a number of suffixes to indicate various different flavors of a primitive beyond just different data types. The flavor suffix uses the following abbreviations:

The suffixes above always appear in alphabetical order. E.g. a 4 channel primitive not affecting the alpha channel with masked operation, in place and with scaling/saturation and ROI would have the postfix: "AC4IMRSfs".

Image Data

Image data is passed to and from NPPI primitives via a pair of parameters:

  1. A pointer to the image's underlying data type.
  2. A line step in bytes (also sometimes called line stride).

The general idea behind this fairly low-level way of passing image data is ease-of-adoption into existing software projects:

Line Step

The line step (also called "line stride" or "row step") allows lines of oddly sized images to start on well-aligned addresses by adding a number of unused bytes at the ends of the lines. This type of line padding has been common practice in digital image processing for a long time and is not particular to GPU image processing.

The line step is the number of bytes in a line including the padding. An other way to interpret this number is to say that it is the number of bytes between the first pixel of successive rows in the image, or generally the number of bytes between two neighboring pixels in any column of pixels.

The general reason for the existence of the line step it is that uniformly aligned rows of pixel enable optimizations of memory-access patterns.

Even though all functions in NPP will work with arbitrarily aligned images, best performance can only be achieved with well aligned image data. Any image data allocated with the NPP image allocators or the 2D memory allocators in the CUDA runtime, is well aligned.

Particularly on older CUDA capable GPUs it is likely that the performance decrease for misaligned data is substantial (orders of magnitude).

All image data passed to NPPI primitives requires a line step to be provided. It is important to keep in mind that this line step is always specified in terms of bytes, not pixels.

Parameter Names for Image Data

There are three general cases of image-data passing throughout NPP detailed in the following sections.

Passing Source-Image Data

Those are images consumed by the algorithm.

Source-Image Pointer

The source image data is generally passed via a pointer named

pSrc

The source image pointer is generally defined constant, enforcing that the primitive does not change any image data pointed to by that pointer. E.g.

nppiPrimitive_32s_C1R(const Npp32s * pSrc, ...)

In case the primitive consumes multiple images as inputs the source pointers are numbered like this:

pSrc1, pScr2, ...

Source-Planar-Image Pointer Array

The planar source image data is generally passed via an array of pointers named

pSrc[]

The planar source image pointer array is generally defined a constant array of constant pointers, enforcing that the primitive does not change any image data pointed to by those pointers. E.g.

nppiPrimitive_8u_P3R(const Npp8u * const pSrc[3], ...)

Each pointer in the array points to a different image plane.

Source-Planar-Image Pointer

The multiple plane source image data is passed via a set of pointers named

pSrc1, pSrc2, ...

The planar source image pointer is generally defined as one of a set of constant pointers with each pointer pointing to a different input image plane.

Source-Image Line Step

The source image line step is the number of bytes between successive rows in the image. The source image line step parameter is

nSrcStep

or in the case of multiple source images

nSrcStep1, nSrcStep2, ...

Source-Planar-Image Line Step Array

The source planar image line step array is an array where each element of the array contains the number of bytes between successive rows for a particular plane in the input image. The source planar image line step array parameter is

rSrcStep[]

Source-Planar-Image Line Step

The source planar image line step is the number of bytes between successive rows in a particular plane of the multiplane input image. The source planar image line step parameter is

nSrcStep1, nSrcStep2, ...

Passing Destination-Image Data

Those are images produced by the algorithm.

Destination-Image Pointer

The destination image data is generally passed via a pointer named

pDst

In case the primitive generates multiple images as outputs the destination pointers are numbered like this:

pDst1, pDst2, ...

Destination-Planar-Image Pointer Array

The planar destination image data pointers are generally passed via an array of pointers named

pDst[]

Each pointer in the array points to a different image plane.

Destination-Planar-Image Pointer

The destination planar image data is generally passed via a pointer to each plane of a multiplane output image named

pDst1, pDst2, ...

Destination-Image Line Step

The destination image line step parameter is

nDstStep

or in the case of multiple destination images

nDstStep1, nDstStep2, ...

Destination-Planar-Image Line Step Array

The destination planar image line step array is an array where each element of the array contains the number of bytes between successive rows for a particular plane in the output image. The destination planar image line step array parameter is

rDstStep[]

Destination-Planar-Image Line Step

The destination planar image line step is the number of bytes between successive rows for a particular plane in a multiplane output image. The destination planar image line step parameter is

nDstStep1, nDstStep2, ...

Passing In-Place Image Data

In-Place Image Pointer

In the case of in-place processing, source and destination are served by the same pointer and thus pointers to in-place image data are called:

pSrcDst

In-Place-Image Line Step

The in-place line step parameter is

nSrcDstStep

Passing Mask-Image Data

Some image processing primitives have variants supporting Masked Operation.

Mask-Image Pointer

The mask-image data is generally passed via a pointer named

pMask

Mask-Image Line Step

The mask-image line step parameter is

nMaskStep

Passing Channel-of-Interest Data

Some image processing primitives support Channel-of-Interest API.

Channel_of_Interest Number

The channel-of-interest data is generally an integer (either 1, 2, or 3):

nCOI

Image Data Alignment Requirements

NPP requires pixel data to adhere to certain alignment constraints: For 2 and 4 channel images the following alignment requirement holds: data_pointer % (#channels * sizeof(channel type)) == 0. E.g. a 4 channel image with underlying type Npp8u (8-bit unsigned) would require all pixels to fall on addresses that are multiples of 4 (4 channels * 1 byte size).

As a logical consequence of all pixels being aligned to their natural size the image line steps of 2 and 4 channel images also need to be multiples of the pixel size.

1 and 3 channel images only require that pixel pointers are aligned to the underlying data type, i.e. pData % sizof(data type) == 0. And consequentially line steps are also held to this requirement.

Image Data Related Error Codes

All NPPI primitives operating on image data validate the image-data pointer for proper alignment and test that the point is not null. They also validate the line stride for proper alignment and guard against the step being less or equal to 0. Failed validation results in one of the following error codes being returnd and the primitive not being executed:

Region-of-Interest (ROI)

In practice processing a rectangular sub-region of an image is often more common than processing complete images. The vast majority of NPP's image-processing primitives allow for processing of such sub regions also referred to as regions-of-interest or ROIs.

All primitives supporting ROI processing are marked by a "R" in their name suffix. In most cases the ROI is passed as a single NppiSize struct, which provides the with and height of the ROI. This raises the question how the primitive knows where in the image this rectangle of (width, height) is located. The "start pixel" of the ROI is implicitly given by the image-data pointer. I.e. instead of explicitly passing a pixel coordinate for the upper-left corner (lowest memory address), the user simply offsets the image-data pointers to point to the first pixel of the ROI.

In practice this means that for an image (pSrc, nSrcStep) and the start-pixel of the ROI being at location (x, y), one would pass

pSrcOffset = pSrc + y * nSrcStep + x * PixelSize;

as the image-data source to the primitive. PixelSize is typically computed as

PixelSize = NumberOfColorChannels * sizeof(PixelDataType).

E.g. for a pimitive like nppiSet_16s_C4R() we would have

ROI Related Error Codes

All NPPI primitives operating on ROIs of image data validate the ROI size and image's step size. Failed validation results in one of the following error codes being returned and the primitive not being executed:

Masked Operation

Some primitive support masked operation. An "M" in the suffix of those variants indicates masked operation. Primitives supporting masked operation consume an additional input image provided via a Mask-Image Pointer and Mask-Image Line Step. The mask image is interpreted by these primitives as a boolean image. The values of type Npp8u are interpreted as boolean values where a values of 0 indicates false, any non-zero values true.

Unless otherwise indicated the operation is only performed on pixels where its spatially corresponding mask pixel is true (non-zero). E.g. a masked copy operation would only copy those pixels in the ROI that have corresponding non-zero mask pixels.

Channel-of-Interest API

Some primitives allow restricting operations to a single channel of interest within a multi-channel image. These primitives are suffixed with the letter "C" (after the channel information, e.g. nppiCopy_8u_C3CR(...). The channel-of-interest is generally selected by offsetting the image-data pointer to point directly to the channel- of-interest rather than the base of the first pixel in the ROI. Some primitives also explicitly specify the selected channel number and pass it via an integer, e.g. nppiMean_StdDev_8u_C3CR(...).

Select-Channel Source-Image Pointer

This is a pointer to the channel-of-interest within the first pixel of the source image. E.g. if pSrc is the pointer to the first pixel inside the ROI of a three channel image. Using the appropriate select-channel copy primitive one could copy the second channel of this source image into the first channel of a destination image given by pDst by offsetting the pointer by one:

* nppiCopy_8u_C3CR(pSrc + 1, nSrcStep, pDst, nDstStep, oSizeROI);
*

Select-Channel Source-Image

Some primitives allow the user to select the channel-of-interest by specifying the channle number (nCOI). This approach is typically used in the image statistical functions. For example,

* nppiMean_StdDev_8u_C3CR(pSrc, nSrcStep, oSizeROI, nCOI, pDeviceBuffer, pMean, pStdDev );
*

The channel-of-interest number can be either 1, 2, or 3.

Select-Channel Destination-Image Pointer

This is a pointer to the channel-of-interest within the first pixel of the destination image. E.g. if pDst is the pointer to the first pixel inside the ROI of a three channel image. Using the appropriate select-channel copy primitive one could copy data into the second channel of this destination image from the first channel of a source image given by pSrc by offseting the destination pointer by one:

* nppiCopy_8u_C3CR(pSrc, nSrcStep, pDst + 1, nDstStep, oSizeROI);
*

Source-Image Sampling

A large number of NPP image-processing functions consume at least one source image and produce an output image (e.g. nppiAddC_8u_C1RSfs() or nppiFilterBox_8u_C1R()). All NPP functions falling into this category also operate on ROIs (see Region-of-Interest (ROI)) which for these functions should be considered to describe the destination ROI. In other words the ROI describes a rectangular region in the destination image and all pixels inside of this region are being written by the function in question.

In order to use such functions successfully it is important to understand how the user defined destination ROI affects which pixels in the input image(s) are being read by the algorithms. To simplify the discussion of ROI propagation (i.e. given a destination ROI, what are the ROIs in in the source(s)), it makes sense to distinguish two major cases:

  1. Point-Wise Operations: These are primitives like nppiAddC_8u_C1RSfs(). Each output pixel requires exaclty one input pixel to be read.
  2. Neighborhood Operations: These are primitives like nppiFilterBox_8u_C1R(), which require a group of pixels from the source image(s) to be read in order to produce a single output.

Point-Wise Operations

As mentioned above, point-wise operations consume a single pixel from the input image (or a single pixel from each input image, if the operation in question has more than one input image) in order to produce a single output pixel.

Neighborhood Operations

In the case of neightborhood operations a number of input pixels (a "neighborhood" of pixels) is read in the input image (or images) in order to compute a single output pixel. All of the functions for Filtering Functions and Morphological Operations are neigborhood operations.

Most of these functions have parameters that affect the size and relative location of the neighborhood: a mask-size structure and an achor-point structure. Both parameters are described in more detail in the next subsections.

Mask-Size Parameter

Many NPP neighborhood operations allow the user to specify the size of the neightborhood via a parameter usually named oMaskSize of type NppiSize. In those cases the neighborhood of pixels read from the source(s) is exactly the size of the mask. Assuming the mask is anchored at location (0, 0) (see Anchor-Point Parameter below) and has a size of (w, h), i.e.

* assert(oMaskSize.w == w);
* assert(oMaskSize.h == h);
* assert(oAnchor.x == 0);
* assert(oAnchor.y == 0);
*

a neighborhood operation would read the following source pixels in order to compute destiation pixel $ D_{i,j} $:

\[ \begin{array}{lllll} S_{i,j} & S_{i,j+1} & \ldots & S_{i,j+w-1} \\ S_{i+1,j} & S_{i+1,j+1} & \ldots & S_{i+1, j+w-1} \\ \vdots & \vdots & \ddots & \vdots \\ S_{i+h-1, j} & S_{i+h-1, j+1} & \ldots & S_{i+h-1, j+w-1} \end{array} \]

Anchor-Point Parameter

Many NPP primitives perforing neighborhood operations allow the user to specify the relative location of the neighborhood via a parameter usually named oAnchor of type NppiPoint. Using the anchor a developer can chose the position of the mask (see Mask-Size Parameter) relative to current pixel index.

Using the same example as in Mask-Size Parameter, but this time with an anchor position of (a, b):

* assert(oMaskSize.w == w);
* assert(oMaskSize.h == h);
* assert(oAnchor.x == a);
* assert(oAnchor.y == b);
*

the following pixels from the source image would be read:

\[ \begin{array}{lllll} S_{i-a,j-b} & S_{i-a,j-b+1} & \ldots & S_{i-a,j-b+w-1} \\ S_{i-a+1,j-b} & S_{i-a+1,j-b+1} & \ldots & S_{i-a+1, j-b+w-1} \\ \vdots & \vdots & \ddots & \vdots \\ S_{i-a+h-1, j-b} & S_{i-a+h-1, j-b+1} & \ldots & S_{i-a+h-1, j-b+w-1} \end{array} \]

Sampling Beyond Image Boundaries

NPP primitives in general and NPP neighborhood operations in particular require that all pixel locations read and written are valid and within the boundaries of the respective images. Sampling outside of the defined image data regions results in undefined behavior and may lead to system instabilty.

This poses a problem in practice: when processing full-size images one cannot choose the destination ROI to be the same size as the source image. Because neigborhood operations read pixels from an enlarged source ROI, the destination ROI must be shrunk so that the expanded source ROI does not exceed the source image's size.

For cases where this "shrinking" of the destination image size is unacceptable, NPP provides a set of border-expanding Copy primitives. E.g. nppiCopyConstBorder_8u_C1R(), nppiCopyReplicateBorder_8u_C1R() and nppiCopyWrapBorder_8u_C1R(). The user can use these primitives to "expand" the source image's size using one of the three expansion modes. The expanded image can then be safely passed to a neighborhood operation producing a full-size result.


Copyright © 2009-2018 NVIDIA Corporation