nvJPEG :: CUDA Toolkit Documentation

Using the nvJPEG Library

The nvJPEG library provides functions for both the decoding of a single image, and batched decoding of multiple images.

Single Image Decoding

For single-image decoding you provide the data size and a pointer to the file data, and the decoded image is placed in the output buffer.

To use the nvJPEG library, start by calling the helper functions for initialization.

Create nvJPEG library handle with the helper function nvjpegCreate().
Create JPEG state with the helper function nvjpegJpegStateCreate(). See nvJPEG Type Declarations and nvjpegJpegStateCreate().

Below is the list of helper functions available in the nvJPEG library:
- nvjpegStatus_t nvjpegGetProperty(libraryPropertyType type, int *value);
- nvjpegStatus_t nvjpegCreate(nvjpegHandle_t *handle , nvjpeg_dev_allocator allocator);
- nvjpegStatus_t nvjpegDestroy(nvjpegHandle_t handle);
- nvjpegStatus_t nvjpegJpegStateCreate(nvjpegHandle_t handle, nvjpegJpegState_t *jpeg_handle);
- nvjpegStatus_t nvjpegJpegStateDestroy(nvjpegJpegState handle);
Retrieve the width and height information from the JPEG-encoded image by using the nvjpegGetImageInfo() function. See also nvjpegGetImageInfo().

Below is the signature of nvjpegGetImageInfo() function:
```
nvjpegStatus_t nvjpegGetImageInfo(
  nvjpegHandle_t              handle,
  const unsigned char         *data,
  size_t                      length,
  int                         *nComponents,
  nvjpegChromaSubsampling_t   *subsampling,
  int                         *widths,
  int                         *heights); 
```
For each image to be decoded, pass the JPEG data pointer and data length to the above function. The nvjpegGetImageInfo() function is thread safe.
One of the outputs of the above nvjpegGetImageInfo() function is nvjpegChromaSubsampling_t. This parameter is an enum type, and its enumerator list is composed of the chroma subsampling property retrieved from the JPEG image. See nvJPEG Chroma Subsampling.
Use the nvjpegDecode() function in the nvJPEG library to decode this single JPEG image. See the signature of this function below:
```
nvjpegStatus_t nvjpegDecode(
  nvjpegHandle_t          handle,
  nvjpegJpegState_t       jpeg_handle,
  const unsigned char     *data,
  size_t                  length, 
  nvjpegOutputFormat_t    output_format,
  nvjpegImage_t           *destination,
  cudaStream_t            stream);
```
In the above nvjpegDecode() function, the parameters nvjpegOutputFormat_t, nvjpegImage_t, and cudaStream_t can be used to set the output behavior of the nvjpegDecode() function. You provide the cudaStream_t parameter to indicate the stream to which your asynchronous tasks are submitted.

The nvjpegOutputFormat_t parameter:

The nvjpegOutputFormat_t parameter can be set to one of the output_format settings below:

output_format	Meaning
`NVJPEG_OUTPUT_UNCHANGED`	Return the decoded image planar format.
`NVJPEG_OUTPUT_RGB`	Convert to planar RGB.
`NVJPEG_OUTPUT_BGR`	Convert to planar BGR.
`NVJPEG_OUTPUT_RGBI`	Convert to interleaved RGB.
`NVJPEG_OUTPUT_BGRI`	Convert to interleaved BGR.
`NVJPEG_OUTPUT_Y`	Return the Y component only.
`NVJPEG_OUTPUT_YUV`	Return in the YUV planar format.

For example, if the output_format is set to NVJPEG_OUTPUT_Y or NVJPEG_OUTPUT_RGBI, or NVJPEG_OUTPUT_BGRI then the output is written only to channel[0], and the other channels are not touched.

Alternately, in the case of planar output, the data is written to the corresponding channels of the nvjpegImage_t destination structure.

Finally, in the case of grayscale JPEG and RGB output, the luminance is used to create the grayscale RGB.

As mentioned above, an important benefit of the nvjpegGetImageInfo()function is the ability to utilize the image information retrieved from the the input JPEG image to allocate proper GPU memory for your decoding operation.

The nvjpegGetImageInfo() function returns the widths, heights and nComponents parameters.

nvjpegStatus_t nvjpegGetImageInfo(
  nvjpegHandle_t             handle, 
  const unsigned char        *data, 
  size_t                     length,     
  int                        *nComponents, 
  nvjpegChromaSubsampling_t  *subsampling, 
  int                        *widths, 
  int                        *heights);

You can use the retrieved parameters, widths, heights and nComponents, to calculate the required size for the output buffers, either for a single decoded JPEG, or for every decoded JPEG in a batch.

To optimally set the destination parameter for the nvjpegDecode() function, use the following guidelines:

For the output_format: NVJPEG_OUTPUT_Y	destination.pitch[0] should be at least: width[0]	destination.channel[0] should be at least of size: destination.pitch[0]*height[0]
For the output_format	destination.pitch[c] should be at least:	destination.channel[c] should be at least of size:
NVJPEG_OUTPUT_YUV	width[c] for c = 0, 1, 2	destination.pitch[c]*height[c] for c = 0, 1, 2
NVJPEG_OUTPUT_RGB and NVJPEG_OUTPUT_BGR	width[0] for c = 0, 1, 2	destination.pitch[0]*height[0] for c = 0, 1, 2
NVJPEG_OUTPUT_RGBI and NVJPEG_OUTPUT_BGRI	width[0]*3	destination.pitch[0]*height[0]
NVJPEG_OUTPUT_UNCHANGED	width[c] for c = [ 0, nComponents - 1 ]	destination.pitch[c]*height[c] for c = [ 0, nComponents - 1]

Ensure that the nvjpegImage_t structure (or structures, in the case of batched decode) is filled with the pointers and pitches of allocated buffers. The nvjpegImage_t structure that holds the output pointers is defined as follows:
```
typedef struct
{
    unsigned char * channel[NVJPEG_MAX_COMPONENT]; 
    unsigned int pitch[NVJPEG_MAX_COMPONENT];
} nvjpegImage_t;
```
NVJPEG_MAX_COMPONENT is the maximum number of color components the nvJPEG library supports in the current release. For generic images, this is the maximum number of encoded channels that the library is able to decompress.
Finally, when you call the nvjpegDecode() function with the parameters as described above, the nvjpegDecode() function fills the output buffers with the decoded data.

Decode by Phases

Alternately, you can decode a single image in multiple phases. This gives you flexibility in controlling the flow, and optimizing the decoding process.

To decode an image in multiple phases, follow these steps:

Just as when you are decoding in a single phase, create the JPEG state with the helper function nvjpegJpegStateCreate().
Next, call the functions in the sequence below (see Decode API -- Multiple Phases.)
- nvjpegDecodePhaseOne()
- nvjpegDecodePhaseTwo()
- nvjpegDecodePhaseThree()
At the conclusion of the third phase, the nvjpegDecodePhaseThree() function writes the decoded output at the memory location pointed to by its *destination parameter.

Batched Image Decoding

For the batched image decoding you provide pointers to multiple file data in the memory, and also provide the buffer sizes for each file data. The nvJPEG library will decode these multiple images, and will place the decoded data in the output buffers that you specified in the parameters.

Single Phase

For batched image decoding in single phase, follow these steps:

Call nvjpegDecodeBatchedInitialize() function to initialize the batched decoder. Specify the batch size in the batch_size parameter. See nvjpegDecodeBatchedInitialize().
Next, call nvjpegDecodeBatched() for each new batch. Make sure to pass the parameters that are correct to the specific batch of images. If the size of the batch changes, or if the batch decoding fails, then call the nvjpegDecodeBatchedInitialize() function again.

Multiple Phases

To decode a batch of images in multiple phases, follow these steps:

Note:

This is the only case where the JPEG state could be used by multiple threads at the same time.

Create the JPEG state with the helper function nvjpegJpegStateCreate().
Call the nvjpegDecodeBatchedInitialize() function to initialize the batched decoder. Specify the batch size in the batch_size parameter, and specify the max_cpu_threads parameter to set the maximum number of CPU threads that work on single batch.
Batched processing is done by calling the functions for the specific phases in sequence:
- In the first phase, call nvjpegDecodePhaseOne() for each image in the batch, according to the index of the image in the batch. Note that this could be done using multiple threads. If multiple threads are used then the thread index in the range [0, max_cpu_threads-1] should be provided to the nvjpegDecodeBatchedPhaseOne() function. Before proceeding to the next phase, ensure that the nvjpegDecodePhaseOne() calls for every image have finished.
- Next, call nvjpegDecodePhaseTwo()..
- Finally, call nvjpegDecodePhaseThree()..
If you have another batch of images of the same size to process, then repeat from 3.