NVIDIA nvTiff#

nvTIFF is a GPU accelerated TIFF(Tagged Image File Format) encode/decode library built on the CUDA platform. The library is supported on Volta+ GPU architectures. It supports the following TIFF feature set:

Note

Throughout this document, the terms “CPU” and “Host” are used synonymously. Similarly, the terms “GPU” and “Device” are synonymous.

Decoder#

  • Planar Separate and Contiguous modes

  • Up to 16 Samples per pixel

  • Compression

    • JPEG(via nvJPEG)

    • Deflate(via nvCOMP)

    • LZW

    • JPEG 2000(via nvJPEG2000)

    • None (uncompressed)

  • Color space can be - Grayscale, RGB, YCbCr, RGB Palette. When compressed data is in YCbCr or Palette mode, the library will convert the decoded output to RGB colorspace.

  • TIFF files can use either tiles or strips.

  • Up to 32 bits per sample when compression type is : None, Deflate, LZW, JPEG2000. Up to 8 bits per sample when compression type is JPEG.

  • TIFFs with multiple images having different properties.

  • APIs to retrieve GeoTIFF Metadata

The below diagram represents nvTIFF decoder’s interaction with other cuda libraries such as nvJPEG and nvCOMP (for DEFLATE decompression). The user application will call cuda APIs to create decode output buffers prior to calling nvTIFF decoder.

nvtiff decoder overview

nvTiff Decoder Overview#

Encoder#

  • Planar Contiguous mode only.

  • Up to 4 samples per pixel.

  • LZW compression.

  • Compressed data is organized in strips.

  • Up to 32 bits per sample.

  • Multiple Images in a TIFF file. All images which are to be compressed must have identical properties.

nvtiff encoder overview

nvTiff Encoder Overview#

Applying GPU Acceleration to TIFF files#

A TIFF file may contain single or multiple images. Each of these images are subdivided into strips or tiles. Each of these strips/tiles can be encoded/decoded in parallel thereby providing speed up over CPU implementations.

When decoding TIFF files with multiple images with identical metadata, the strips/tiles across all images can be decoded as part of a single batched CUDA kernel. The converse is true for encoding, each strip/tile can be compressed in parallel. The compressed tiles/strips can be stitched to create a TIFF file

Prerequisites#

  • CUDA Toolkit version 11.0 and above.

  • CUDA Driver version r450 and above.

  • nvCOMP 2.6+ (required when compression is deflate).

  • nvJPEG2000 0.8.1+ (required when compression is JPEG 2000).

Platforms Supported#

  • Linux versions:

Architecture

Distribution Information

Name

Version

Kernel

GCC

GLIBC

x86_64

RHEL / Rocky Linux

9.4

5.14

11.3.1

2.34

8.10

4.18

8.5.0

2.28

Ubuntu

24.04

6.8.0

13.2.0

2.39

22.04

6.5.0

11.2.0

2.34

20.04

5.15.0

9.3.0

2.31

OpenSUSE Leap

15.6

6.4.0

7.5.0

2.38

SUSE SLES

15

6.4.0

7.5.0

2.31

Debian

12.8

6.1.0

12.2.0

2.36

Fedora

41

6.11.4

14.2.1

2.40

arm64-sbsa

RHEL

9.4

5.14

11.3.1

2.34

8.10

4.18

8.5.0

2.28

24.04

6.8.0

13.2.0

2.35

Ubuntu

22.04

5.15.0

11.4.0

2.35

20.04

5.4.0

9.4.0

2.31

SUSE SLES

15.6

6.4.0

7.5.0

2.38

  • Windows versions:

    • Windows 10, Windows 11 and Windows Server 2023

    • WSL

  • Tegra

    • Supported on Jetpack v5.1.3 and above.