Using Tensorflow DALI plugin with sparse tensors¶
Overview¶
Using our DALI data loading and augmentation pipeline with Tensorflow is pretty simple.
However, sometimes a batch of data that uses wants to extract from the pipeline cannot be represented as a dense tensor. In such case, DALI op utilizes TensorFlow SparseTensor. Please keep in mind that SparseTensors are supported only for the CPU based piepline.
Defining the data loading pipeline¶
First, we start by defining some simple pipeline that will return data as a sparse tensor. To ochieve this, we will use well known COCO data set. Each image may have 0 or more bounding boxes with labels describing objects present in it.Wa want to return images in a normalized way, while labels and bounding boxes will be represented as sparse tensors. At the beginning let us define some global parameters
DALI_EXTRA_PATH
environment variable should point to the place where data from DALI extra repository is downloaded. Please make sure that the proper release tag is checked out.
[1]:
from __future__ import print_function
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
import os.path
test_data_root = os.environ['DALI_EXTRA_PATH']
BATCH_SIZE = 32
DEVICES = 1
test_data_root = os.environ['DALI_EXTRA_PATH']
file_root = os.path.join(test_data_root, 'db', 'coco', 'images')
annotations_file = os.path.join(test_data_root, 'db', 'coco', 'instances.json')
Pipeline with the COCO reader is created. Please notice that while images are processed, other data from COCO ara passes through.
[2]:
class COCOPipeline(Pipeline):
def __init__(self, batch_size, num_threads, device_id, num_gpus):
super(COCOPipeline, self).__init__(batch_size, num_threads, device_id, seed = 15)
self.input = ops.COCOReader(file_root = file_root, annotations_file = annotations_file,
shard_id = device_id, num_shards = num_gpus, ratio=False, save_img_ids=True)
self.decode = ops.ImageDecoder(device = "cpu", output_type = types.RGB)
self.resize = ops.Resize(device = "cpu",
image_type = types.RGB,
interp_type = types.INTERP_LINEAR)
self.cmn = ops.CropMirrorNormalize(device = "cpu",
output_dtype = types.FLOAT,
crop = (224, 224),
image_type = types.RGB,
mean = [128., 128., 128.],
std = [1., 1., 1.])
self.res_uniform = ops.Uniform(range = (256.,480.))
self.uniform = ops.Uniform(range = (0.0, 1.0))
self.cast = ops.Cast(device = "cpu",
dtype = types.INT32)
def define_graph(self):
inputs, bboxes, labels, im_ids = self.input()
images = self.decode(inputs)
images = self.resize(images, resize_shorter = self.res_uniform())
output = self.cmn(images, crop_pos_x = self.uniform(),
crop_pos_y = self.uniform())
output = self.cast(output)
return (output, bboxes, labels, im_ids)
Next, we instatiate the pipelines with the right parameters. We will create one pipeline per GPU, by specifying the right device_id
for each pipeline.
The difference is that instead of calling pipeline.build
and using it, we will pass the pipeline object to the TensorFlow operator.
[3]:
pipes = [COCOPipeline(batch_size=BATCH_SIZE, num_threads=2, device_id = device_id, num_gpus = DEVICES) for device_id in range(DEVICES)]
Using DALI TensorFlow plugin¶
Let’s start by importing Tensorflow and the DALI Tensorflow plugin as dali_tf
.
[4]:
import tensorflow as tf
import nvidia.dali.plugin.tf as dali_tf
import time
try:
from tensorflow.compat.v1 import GPUOptions
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import Session
from tensorflow.compat.v1 import placeholder
except:
# Older TF versions don't have compat.v1 layer
from tensorflow import GPUOptions
from tensorflow import ConfigProto
from tensorflow import Session
from tensorflow import placeholder
try:
tf.compat.v1.disable_eager_execution()
except:
pass
We can now use nvidia.dali.plugin.tf.DALIIterator()
method to get the Tensorflow Op that will produce the tensors we will use in the Tensorflow graph.
For each DALI pipeline, we use daliop
that returns a Tensorflow tensor tuple that we will store in image, bouding boxes, labels and image ids
.To enable sparse tensor generation sparse
argument need to be filled with True
values for the output elements that are going to be represented as a sparse tensors.
[5]:
daliop = dali_tf.DALIIterator()
images = []
bboxes = []
labels = []
image_ids = []
for d in range(DEVICES):
with tf.device('/cpu'):
image, bbox, label, id = daliop(pipeline = pipes[d],
shapes = [(BATCH_SIZE, 3, 224, 224), (), (), ()],
dtypes = [tf.int32, tf.float32, tf.int32, tf.int32], sparse = [False, True, True])
images.append(image)
bboxes.append(bbox)
labels.append(label)
image_ids.append(id)
Using the tensors in a simple Tensorflow graph¶
We will use images
, bboxes
, labels
and image_ids
tensors list in our Tensorflow graph definition. Then run a very simple one op graph session that will output the batch of data. Then we will print bounding boxes, labels and image_ids.
[6]:
with Session() as sess:
all_img_per_sec = []
total_batch_size = BATCH_SIZE * DEVICES
start_time = time.time()
# The actual run with our dali_tf tensors
res_cpu = sess.run([images, bboxes, labels, image_ids])
print(res_cpu[1])
print(res_cpu[2])
print(res_cpu[3])
[SparseTensorValue(indices=array([[ 0, 0, 0],
[ 0, 0, 1],
[ 0, 0, 2],
...,
[31, 4, 1],
[31, 4, 2],
[31, 4, 3]]), values=array([313., 168., 162., 120., 100., 216., 182., 237., 138., 15., 404.,
172., 215., 305., 69., 80., 248., 64., 344., 311., 123., 66.,
95., 176., 194., 209., 48., 207., 122., 178., 47., 248., 400.,
115., 176., 158., 88., 217., 91., 114., 49., 148., 257., 184.,
99., 40., 361., 130., 89., 84., 259., 246., 213., 455., 270.,
158., 144., 137., 92., 150., 275., 39., 286., 32., 185., 78.,
12., 90., 273., 39., 275., 220., 180., 311., 226., 12., 351.,
96., 85., 168., 178., 9., 23., 183., 167., 194., 355., 90.,
95., 193., 151., 226., 298., 315., 370., 63., 381., 311., 210.,
110., 247., 84., 385., 175., 137., 44., 161., 112., 282., 15.,
336., 130., 159., 332., 387., 97., 100., 285., 300., 116., 374.,
73., 142., 20., 272., 93., 348., 62., 22., 1., 266., 226.,
376., 79., 143., 157., 285., 69., 280., 232., 208., 143., 300.,
107., 62., 129., 350., 171., 166., 93., 331., 183., 334., 7.,
95., 125., 221., 54., 354., 84., 240., 131., 258., 22., 290.,
173., 337., 61., 460., 144., 52., 187., 157., 221., 279., 150.,
172., 306., 322., 38., 263., 143., 325., 114., 82., 61., 317.,
110., 280., 88., 162., 46., 222., 102., 258., 177., 103., 135.,
83., 200., 338., 105., 286., 288., 428., 229., 63., 30., 54.,
3., 392., 338., 498., 169., 63., 166., 86., 237., 61., 110.,
397., 130., 13., 32., 8., 30., 232., 142., 31., 189., 233.,
29., 183., 76., 339., 79., 254., 23., 309., 231., 234., 316.,
262., 61., 110., 152., 339., 11., 188., 19., 136., 202., 498.,
1., 159., 124., 392., 197., 155., 41., 44., 70., 335., 126.,
239., 159., 59., 344., 230., 8., 288., 324., 185., 88., 233.,
116., 124., 7., 90., 90., 24., 156., 363., 219., 484., 262.,
198., 186., 546., 381., 117., 60., 246., 96., 260., 248., 103.,
108., 17., 184., 134., 169., 236., 212., 177., 125., 268., 183.,
95., 220., 298., 124., 143., 116., 247., 222., 347., 44., 318.,
80., 353., 211., 293., 53., 76., 29., 52., 172., 192., 83.,
198., 185., 33., 221., 329., 149., 181., 298., 396., 102., 202.,
136., 269., 222., 13., 229., 236., 149., 311., 14., 309., 183.,
474., 359., 127., 79., 258., 143., 189., 170., 348., 222., 211.,
13., 129., 205., 190., 61., 391., 142., 14., 201., 12., 172.,
217., 16.], dtype=float32), dense_shape=array([32, 5, 4]))]
[SparseTensorValue(indices=array([[ 0, 0],
[ 1, 0],
[ 2, 0],
[ 3, 0],
[ 3, 1],
[ 3, 2],
[ 3, 3],
[ 4, 0],
[ 5, 0],
[ 5, 1],
[ 5, 2],
[ 6, 0],
[ 6, 1],
[ 6, 2],
[ 7, 0],
[ 7, 1],
[ 7, 2],
[ 7, 3],
[ 8, 0],
[ 8, 1],
[ 8, 2],
[ 8, 3],
[ 8, 4],
[ 9, 0],
[ 9, 1],
[10, 0],
[10, 1],
[10, 2],
[11, 0],
[11, 1],
[12, 0],
[12, 1],
[12, 2],
[12, 3],
[12, 4],
[13, 0],
[14, 0],
[14, 1],
[14, 2],
[14, 3],
[14, 4],
[15, 0],
[15, 1],
[15, 2],
[16, 0],
[16, 1],
[16, 2],
[16, 3],
[17, 0],
[17, 1],
[17, 2],
[17, 3],
[17, 4],
[18, 0],
[18, 1],
[19, 0],
[19, 1],
[19, 2],
[19, 3],
[20, 0],
[20, 1],
[20, 2],
[20, 3],
[20, 4],
[21, 0],
[21, 1],
[22, 0],
[22, 1],
[22, 2],
[23, 0],
[24, 0],
[24, 1],
[24, 2],
[25, 0],
[25, 1],
[25, 2],
[26, 0],
[27, 0],
[27, 1],
[27, 2],
[27, 3],
[28, 0],
[29, 0],
[29, 1],
[29, 2],
[29, 3],
[29, 4],
[30, 0],
[30, 1],
[31, 0],
[31, 1],
[31, 2],
[31, 3],
[31, 4]]), values=array([33, 34, 12, 20, 8, 34, 28, 49, 36, 70, 56, 23, 25, 24, 64, 1, 42,
44, 73, 72, 5, 39, 8, 10, 14, 75, 50, 22, 77, 71, 31, 63, 32, 70,
59, 27, 69, 74, 37, 14, 22, 45, 16, 60, 16, 78, 15, 30, 29, 58, 38,
25, 79, 28, 74, 47, 67, 28, 1, 27, 11, 25, 17, 39, 31, 16, 32, 75,
59, 72, 15, 58, 11, 18, 25, 72, 32, 44, 17, 45, 80, 77, 61, 68, 3,
20, 45, 70, 47, 2, 42, 73, 51, 64], dtype=int32), dense_shape=array([32, 5]))]
[array([[ 0],
[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6],
[ 7],
[ 8],
[ 9],
[10],
[11],
[12],
[13],
[14],
[15],
[16],
[17],
[18],
[19],
[20],
[21],
[22],
[23],
[24],
[25],
[26],
[27],
[28],
[29],
[30],
[31]], dtype=int32)]
Let us check the output images with their augmentations! Tensorflow outputs numpy arrays, so we can visualize them easily with matplotlib
.
We define a show_images
helper function that will display a sample of our batch.
The batch layout is NCHW so we use transpose to get HWC images, that matplotlib
can show.
[7]:
from __future__ import division
import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
%matplotlib inline
def show_images(image_batch, nb_images):
columns = 4
rows = (nb_images + 1) // (columns)
fig = plt.figure(figsize = (32,(32 // columns) * rows))
gs = gridspec.GridSpec(rows, columns)
for j in range(nb_images):
plt.subplot(gs[j])
plt.axis("off")
img = image_batch[0][j].transpose((1,2,0)) + 128
plt.imshow(img.astype('uint8'))
show_images(res_cpu[0], 8)