Quick Start Guide
##################

This section will explain how to use decoder and encoder APIs in a few quick
steps. The API details will be covered in the next section.

.. note:: Link to nvTIFF Samples: `https://github.com/NVIDIA/CUDALibrarySamples/tree/master/nvTIFF <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/nvTIFF>`__

Please note that since the decoding and encoding of TIFF images are two
fundamentally different problems the APIs for decoding and encoding are also
different and independent.

nvTIFF Decode 
^^^^^^^^^^^^^^^^
The library reads the file from disk and loads the image data to device memory. 

1. Create instances of the following -

:code:`nvtiffStream_t` - is used to parse the bitstream and store the bitstream metadata

:code:`nvtiffDecoder_t` - is used to store the work buffers required for decode

.. code-block:: cpp
    
    nvtiffStream_t nvtiff_stream;
    nvtiffDecoder_t nvtiff_decoder;

    nvtiffStreamCreate(&nvtiff_stream);
    nvtiffDecoderCreateSimple(&nvtiff_decoder);

2. Use the nvtiffStreamParseFromFile API to parse the tiff file from disk.

.. code-block:: cpp
        
        // char *fname, is the tiff file name
        nvtiffStatus_t status = nvtiffStreamParseFromFile(fname, nvtiff_stream));
        // make sure that nvtiffStreamParseFromFile returns NVTIFF_STATUS_SUCCESS before proceeding to the next step

3.  Extract the tiff file meta data. 

.. code-block:: cpp

        nvtiffFileInfo_t file_info;
        nvtiffStatus_t status = nvtiffStreamGetFileInfo(tiff_stream, &file_info);
        //nvTiff requires all the images (subfiles) in the same file to have the same properties.


3.  Allocate decode output on device.

.. code-block:: cpp
        
        // allocate device memory for images
	unsigned char **image_out = NULL;

        
	const size_t image_size = sizeof(**image_out)*file_info.image_width *
						      file_info.image_height *
						      (file_info.bits_per_pixel/8);

        // we are decoding all the images in file "fname" from
        // subfile no. "frameBeg" to subfile no. "frameEnd"
        frame_beg = fmax(frame_beg, 0);
	frame_end = fmin(frame_end, file_info.num_images - 1);
	const int num_decoded_images = frame_end - frame_beg + 1;


	image_out = (unsigned char **)Malloc(sizeof(*image_out)*num_decoded_images);
	for(unsigned int i = 0; i < nDecode; i++) {
                CHECK_CUDA(cudaMalloc(image_out + i, image_size));
	}

3. Call nvtiffDecode function to decode the data or range of data from files.

.. code-block:: cpp

        if (!decodeRange) {
                nvtiffStatus_t status = nvtiffDecode(nvtiff_stream, nvtiff_decoder, image_out, stream);
        } else { 
                nvtiffStatus_t status = nvtiffDecodeRange(nvtiff_stream, nvtiff_decoder, frame_beg, num_decoded_images, image_out, stream);
        }
        cudaStreamSynchronize(stream); 
        // cudaStreamSynchronize is requires since the decode APIs are asychronous with respect to the host

5. Go to step 1 to decode another image. Once all images are decoded, release nvTIFF the library resources by calling the corresponding destroy APIs.


nvTIFF Encode 
^^^^^^^^^^^^^^^^


1. Initialize the library handles and encoder parameters listed below:

.. code-block:: cpp

        // unsigned char **images_d is an host array of "nSubFiles" pointers
        // to device buffers containing "nSubFiles" uncompressed images; each
        // image has the same number of rows (nrow), of columns (ncol)
        // and pixel size in bytes (pixelSize)

        // for example let's partition the images in strips of four rows each
        unsigned int encRowsPerStrip = 4;
        unsigned int nStripOut = DIV_UP(nrow, encRowsPerStrip);
        unsigned int totStrips = nSubFiles*nStripOut;

        // initial estimate on the maximim 
        // size of compressed strips
        unsigned long long encStripAllocSize = rowsPerStrip*ncol*(pixelSize);

        // allocate encoding output buffers;
        CHECK_CUDA(cudaMalloc(&stripSize_d, sizeof(*stripSize_d)*totStrips));
        CHECK_CUDA(cudaMalloc(&stripOffs_d, sizeof(*stripOffs_d)*totStrips));
        CHECK_CUDA(cudaMalloc(&stripData_d, sizeof(*stripData_d)*totStrips*encStripAllocSize));

        // create encoding context
	nvTiffEncodeCtx_t *ctx = nvTiffEncodeCtxCreate(devId, nSubFiles, nStripOut);


2. Call nvTiffEncode function to encode. Since we cant't know in advance the
   size of the compressed strips, we first try to encode in the buffers allocated
   based on our initial estimate. If one or more strips require more memory than
   "encStripAllocSize" bytes then we need to restart the encoding process with
   a larger buffer. In such a case, after the encoding fails, the minimum size 
   required for the encoding to succeed is passed from the library to the user
   in the context field stripSizeMax. This way the encoding process can only
   fail once due to an output buffer being too small.
   After a successful encoding (nvTiffEncodeFinalize() returning NVTIFF_ENCODE_SUCCESS),
   the compressed strip data, offsets and sizes are returned in the buffers stripData_d,
   stripOffs_d and stripSize_d. In addition, the total size of the compressed strip data
   is also returned in ctx->stripSizeTot.
   Please note that you need to synchronize on stream ``stream`` before accessing those
   buffers.

.. code-block:: cpp   
       
   int i = 0;
   do {
           rv = nvTiffEncode(ctx,
                             nrow,
                             ncol,
                             pixelSize,
                             encRowsPerStrip,
                             nSubFiles,
                             imageOut_d,
                             encStripAllocSize,
                             stripSize_d,
                             stripOffs_d,
                             stripData_d,
                             stream);
           if (rv != NVTIFF_ENCODE_SUCCESS) {
                   // ERROR, WHILE ENCODING IMAGES!
           }
           rv = nvTiffEncodeFinalize(ctx, stream);
           if (rv != NVTIFF_ENCODE_SUCCESS) {
                   if (rv == NVTIFF_ENCODE_COMP_OVERFLOW) {
                           if (i == 1) {
                                   // UNKNOWN ERROR, nvTiffEncode() SHOULDN'T OVERFLOW TWICE!
                           }
                           encStripAllocSize = ctx->stripSizeMax;
                           nvTiffEncodeCtxDestroy(ctx);
                           cudaFree(stripData_d);
                           cudaMalloc(&stripData_d,
                                      sizeof(*stripData_d)*totStrips*encStripAllocSize);
                           ctx = nvTiffEncodeCtxCreate(dev, ...);
                           i++;
                   } else {
                           // ERROR WHILE FINALIZING COMPRESSED IMAGES
                   }
           }
   } while(rv == NVTIFF_ENCODE_COMP_OVERFLOW); 
   CHECK_CUDA(cudaStreamSynchronize(stream));

3. Write the compress image to TIFF file.

.. code-block:: cpp

        // copy compressed data from the device to the host
        unsigned long long *stripSize_h = (unsigned long long *)Malloc(sizeof(*stripSize_h)*totStrips);
        CHECK_CUDA(cudaMemcpy(stripSize_h,
                              stripSize_d,
                              sizeof(*stripSize_h)*totStrips,
                              cudaMemcpyDeviceToHost));

        unsigned long long *stripOffs_h = (unsigned long long *)Malloc(sizeof(*stripOffs_h)*totStrips);
        CHECK_CUDA(cudaMemcpy(stripOffs_h,
                              stripOffs_d,
                              sizeof(*stripOffs_h)*totStrips,
                              cudaMemcpyDeviceToHost));

        unsigned char *stripData_h = (unsigned char *)Malloc(sizeof(*stripData_h)*ctx->stripSizeTot);
        CHECK_CUDA(cudaMemcpy(stripData_h,
                              stripData_d,
                              ctx->stripSizeTot,
                              cudaMemcpyDeviceToHost));

        // write output file
        nvTiffWriteFile("outFile.tif",
                        VER_REG_TIFF,
                        nSubFiles,
                        nrow,
                        ncol,
                        encRowsPerStrip,
                        samplesPerPixel,
                        bitsPerSample,
                        photometricInt,
                        planarConf,
                        stripSize_h,
                        stripOffs_h,
                        stripData_h);


Tiff Decode / Encode Demo example
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The binary nvTiff_exmaple provides a complete and detailed usage example
for the encoding and decoding capabilities of the nvTIFF library.

.. code-block:: cpp

	Usage:
	nvTiff_example [options] -f|--file <TIFF_FILE>

	General options:

		-d DEVICE_ID
		--device DEVICE_ID
		        Specifies the GPU to use for images decoding/encoding.
		        Default: device 0 is used.

		-v
		--verbose
		        Prints some information about the decoded TIFF file.

		-h
		--help
		        Prints this help

	Decoding options:

		-f TIFF_FILE
		--file TIFF_FILE
		        Specifies the TIFF file to decode. The code supports both single and multi-image
		        tiff files with the following limitations:                                      
		          * color space must be either Grayscale (PhotometricInterp.=1) or RGB (=2)     
		          * image data compressed with LZW (Compression=5) or uncompressed              
		          * pixel components stored in "chunky" format (RGB..., PlanarConfiguration=1)
		            for RGB images                                                              
		          * image data must be organized in Strips, not Tiles                           
		          * pixels of RGB images must be represented with at most 4 components 
		          * each component must be represented exactly with:
		          * 8 bits for LZW compressed images                                        
		          * 8, 16 or 32 bits for uncompressed images                                
		          * all images in the file must have the same properties                        

		-b BEG_FRM
		--frame-beg BEG_FRM
		        Specifies the image id in the input TIFF file to start decoding from.  The image
		        id must be a value between 0 and the total number of images in the file minus 1.
		        Values less than 0 are clamped to 0.
		        Default: 0

		-e END_FRM
		--frame-end END_FRM
		        Specifies the image id in the input TIFF file to stop  decoding  at  (included).
		        The image id must be a value between 0 and the total number  of  images  in  the
		        file minus 1.  Values greater than num_images-1  are  clamped  to  num_images-1.
		        Default:  num_images-1.

		-m
		--memtype TYPE
		        Specifies the type of memory used to hold  the  TIFF  file  content:  pinned  or
		        pageable.  Pinned memory is used if 'p' is specified. Pageable memory is used if
		        'r' is specified.  In case of pinned memory,  file  content  is  not  copied  to
		        device memory before the decoding process (with a resulting performance  impact)
		        unless the option -c is also specified (see below).
		        Defualt: r (pageable)

		-c
		--copyh2d
		        Specifies to copy the file data to device memory in case the -m option specifies
		        to use pinned memory.  In case of pageable memory this  option  has  no  effect.
		        Default: off.

		--decode-out NUM_OUT
		        Enables the writing of selected images from the decoded  input  TIFF  file  into
		        separate BMP files for inspection.  If no argument is  passed,  only  the  first
		        image is written to disk,  otherwise  the  first  NUM_OUT  images  are  written.
		        Output files are named outImage_0.bmp, outImage_1.bmp...
		        Defualt: disabled.

	Encoding options:

		-E
		--encode
		        This option enables the encoding of the raster images obtained by  decoding  the
		        input TIFF file.  The images are divided into strips, compressed  with  LZW and,
		        optionally, written into an output TIFF file.
		        Default: disabled.

		-r
		--rowsxstrip
		        Specifies the number of consecutive rows  to  use  to  divide  the  images  into
		        strips.  Each image is divided in strips of the same size (except  possibly  the
		        last strip) and then the strips are  compressed  as  independent  byte  streams.
		        This option is ignored if -E is not specified.
		        Default: 1.

		-s
		--stripalloc
		        Specifies the initial estimate of the maximum size  of  compressed  strips.   If
		        during compression one or more strips require more  space,  the  compression  is
		        aborted and restarted automatically with a safe estimate. 
		        This option is ignored if -E is not specified.
		        Default: the size, in bytes, of a strip in the uncompressed images.

		--encode-out
		        Enables the writing of the compressed  images  to  an  output  TIFF  file named
		        outFile.tif.
		        This option is ignored if -E is not specified.
		        Defualt: disabled.


Python Tiff Decode example
^^^^^^^^^^^^^^^^^^^^^^^^^^

Prerequisites
"""""""""""""
Python packages 

1. cupy

.. code-block:: cpp

	$  pip install cupy

2. numpy

.. code-block:: cpp

	$  pip install numpy

3. tifffile

.. code-block:: cpp

	$  pip install tifffile

3. imagecodecs

.. code-block:: cpp

	$  pip install imagecodecs


Install nvTIFF Python Wheel 
"""""""""""""""""""""""""""

.. code-block:: cpp

	$  pip install nvtiff-0.1.0-cp36-cp36m-linux_x86_64.whl


Usage:
""""""
.. code-block:: cpp

	$ python3 nvtiff_test.py -h

.. code-block:: cpp

        usage: nvtiff_test.py [-h] [-o OUTPUT_FILE_PREFIX] [-s] [-c] [-p]
                        [-r SUBFILE_RANGE]
                        tiff_file
        positional arguments:
        tiff_file             tiff file to decode.

        optional arguments:
        -h, --help            show this help message and exit
        -o OUTPUT_FILE_PREFIX, --output_file_prefix OUTPUT_FILE_PREFIX
                                Output file prefix to save decoded data. Will save one
                                file per image in tiff file.
        -s, --return_single_array
                                Return single array from nvTiff instead of list of
                                arrays
        -c, --check_output    Compare nvTiff output to reference CPU result
        -p, --use_pinned_mem  Read TIFF data from pinned memory.
        -r SUBFILE_RANGE, --subfile_range SUBFILE_RANGE
                                comma separated list of starting and ending file
                                indices to decode, inclusive

Python Example
"""""""""""""""
.. code-block:: cpp

	$ python3 nvtiff_test.py bali_notiles.tif 

.. code-block:: cpp

        Command line arguments:
                tiff_file: bali_notiles.tif
                return_single_array: False
                output_file_prefix: None
                check_output: False
                use_pinned_mem: False
                subfile_range: None

        Time for tifffile:

                decode:   0.010347366333007812 s
                h2d copy: 0.0010058879852294922 s
                total:    0.011353254318237305 s

        Time for nvTiff:

                open: 0.002551555633544922 s
                decode: 0.0005545616149902344 s
                total:  0.0031061172485351562 s