tensorflow_quantization¶
- tensorflow_quantization.G_NUM_BITS
8 bit quantization is used by default. However, it can be changed by using
G_NUM_BITS
global variable. The following code snippet performs 4 bit quantization.import tensorflow_quantization # get pretrained model ..... # perform 4 bit quantization tensorflow_quantization.G_NUM_BITS = 4 q_model = quantize_model(nn_model_original) # fine-tune model .....
Check
test_end_to_end_workflow_4bit()
test case fromquantize_test.py
test module.- tensorflow_quantization.G_NARROW_RANGE
If True, the absolute value of quantized minimum is the same as the quantized maximum value. For example, minimum of -127 is used for 8 bit quantization instead of -128. TensorRT ™ only supports G_NARROW_RANGE=True.
- tensorflow_quantization.G_SYMMETRIC
If True, 0.0 is always in the center of real min, max i.e. zero point is always 0. TensorRT ™ only supports G_SYMMETRIC=True.
Attention
When used, set global variables immediately before the quantize_model
function call.