tensorflow_quantization.QuantizationSpec

class tensorflow_quantization.QuantizationSpec[source]

Helper class holding config objects for all layers to quantize.

add(name: Union[str, List], is_keras_class: Union[bool, List] = False, quantize_input: Union[bool, List] = True, quantize_weight: Union[bool, List] = True, quantization_index: Optional[Union[List, List[List]]] = None) None[source]

Takes user parameters and adds LayerConfig object to a list for each add call.

Parameters
  • name (Union[str, List]) -- Name of the layer. As seen from utilities such as model.summary()

  • is_keras_class (Union[bool, List]) -- List or a single value. Set this to True if layer_name passed represents a layer class from Keras. Default is False.

  • quantize_input (Union[bool, List]) -- List or a single value. Set this to True if input to the layers should be quantized. Default is True since default behavior is following Nvidia quantization recipe.

  • quantize_weight (Union[bool, List]) -- List or a single value. Set this to True if weights to the layers should be quantized. Default is True since default behavior is following Nvidia quantization recipe. For weightless layers, value is ignored.

  • quantization_index (Union[List, List[List]]) -- List or List of List. List with indices on inputs to which quantization is applied for the layers with multiple inputs. E.g Add, Concatenate

Returns

None

Examples

Let's write a simple network to use in all examples.

import tensorflow as tf
# Import necessary methods from the Quantization Toolkit
from tensorflow_quantization.quantize import quantize_model, QuantizationSpec

# 1. Create a small network
input_img = tf.keras.layers.Input(shape=(28, 28))
x = tf.keras.layers.Reshape(target_shape=(28, 28, 1))(input_img)
x = tf.keras.layers.Conv2D(filters=126, kernel_size=(3, 3))(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3))(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3))(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Conv2D(filters=16, kernel_size=(3, 3))(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Conv2D(filters=8, kernel_size=(3, 3))(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(100)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(10)(x)
model = tf.keras.Model(input_img, x)
  1. Select layers based on layer names

    Goal: Quantize the 2nd Conv2D, 4th Conv2D and 1st Dense layer in the following network.

    # 1. Find out layer names
    print(model.summary())
    
    # 2. Create quantization spec and add layer names
    q_spec = QuantizationSpec()
    layer_name = ['conv2d_1', 'conv2d_3', 'dense']
    
    """
    # Alternatively, each layer configuration can be added one at a time:
    q_spec.add('conv2d_1')
    q_spec.add('conv2d_3')
    q_spec.add('dense')
    """
    
    q_spec.add(name=layer_name)
    
    # 3. Quantize model
    q_model = quantize_model(model, quantization_mode='partial', quantization_spec=q_spec)
    print(q_model.summary())
    
    tf.keras.backend.clear_session()
    
  2. Select layers based on layer class

    Goal: Quantize all Conv2D layers.

    # 1. Create QuantizationSpec object and add layer class
    q_spec = QuantizationSpec()
    q_spec.add(name='Conv2D', is_keras_class=True)
    
    # 2. Quantize model
    q_model = quantize_model(model, quantization_mode='partial', quantization_spec=q_spec)
    q_model.summary()
    
    tf.keras.backend.clear_session()
    
  3. Select layers based both layer name and layer class

    Goal: Quantize all Dense layers and the 3rd Conv2D layer.

    # 1. Create QuantizationSpec object and add layer information
    q_spec = QuantizationSpec()
    
    layer_name = ['Dense', 'conv2d_2']
    layer_is_keras_class = [True, False]
    
    """
    # Alternatively, each layer configuration can be added one at a time:
    q_spec.add(name='Dense', is_keras_class=True)
    q_spec.add(name='conv2d_2')
    """
    
    q_spec.add(name=layer_name, is_keras_class=layer_is_keras_class)
    
    # 2. Quantize model
    q_model = quantize_model(model, quantization_mode='partial', quantization_spec=q_spec)
    q_model.summary()
    
    tf.keras.backend.clear_session()
    
  4. Select inputs at specific index for multi-input layers

    For layers with multiple inputs, the user can choose which ones need to be quantized. Assume a network that has two layers of class Add.

    Goal: Quantize index 1 of add layer, index 0 of add_1 layer and the 3rd Conv2D layer.

    # 1. Create QuantizationSpec object and add layer information
    q_spec = QuantizationSpec()
    
    layer_name = ['add', 'add_1', 'conv2d_2']
    layer_q_indices = [[1], [0], None]
    
    """
    # Alternatively, each layer configuration can be added one at a time:
    q_spec.add(name='add', quantization_index=[1])
    q_spec.add(name='add', quantization_index=[0])
    q_spec.add(name='conv2d_2')
    """
    
    q_spec.add(name=layer_name, quantization_index=layer_q_indices)
    
    # 2. Quantize model
    q_model = quantize_model(model, quantization_mode='partial', quantization_spec=q_spec)
    q_model.summary()
    
    tf.keras.backend.clear_session()
    
  5. Quantize only weight and NOT input

    Goal: Quantize the 2nd Conv2D, 4th Conv2D and 1st Dense layer in the following network. In addition to that, quantize only the weights of the 2nd Conv2D.

    # 1. Find out layer names
    print(model.summary())
    
    # 2. Create quantization spec and add layer names
    q_spec = QuantizationSpec()
    layer_name = ['conv2d_1', 'conv2d_3', 'dense']
    layer_q_input = [False, True, True]
    
    """
    # Alternatively, each layer configuration can be added one at a time:
    q_spec.add('conv2d_1', quantize_input=False)
    q_spec.add('conv2d_3')
    q_spec.add('dense')
    """
    
    q_spec.add(name=layer_name, quantize_input=layer_q_input)
    
    # 3. Quantize model
    q_model = quantize_model(model, quantization_mode='partial', quantization_spec=q_spec)
    print(q_model.summary())
    
    tf.keras.backend.clear_session()