Common API¶

Classes¶

class transformer_engine.common.recipe.Format(value)¶

Supported FP8 formats.

Values

E4M3 – All FP8 tensors are in e4m3 format
E5M2 – All FP8 tensors are in e5m2 format
HYBRID – FP8 tensors in the forward pass are in e4m3 format, FP8 tensors in the backward pass are in e5m2 format

class transformer_engine.common.recipe.DelayedScaling(margin=0, interval=1, fp8_format=Format.E4M3, amax_history_len=1, amax_compute_algo='most_recent', scaling_factor_compute_algo=None, override_linear_precision=(False, False, False))¶

Use the delayed scaling factor strategy. Use scale factor from previous iteration, recompute once every interval, and record amax history of amax_history_len steps.

Parameters

margin (int, default = 0) – Margin for the scaling factor computation.
interval (int, default = 1) – Controls how often the scaling factor is recomputed.
fp8_format ({Format.E4M3, Format.HYBRID}, default = Format.HYBRID) – Controls the FP8 data format used during forward and backward pass.
amax_history_len (int, default = 1) – The length of the amax history window used for scaling factor computation.
amax_compute_algo ({'max', 'most_recent', Callable}, default = 'most_recent') –
Algorithm used for choosing the amax value for the scaling factor computation. There are 2 predefined choices: max chooses the largest amax in the history window, while most_recent always chooses the most recently seen value. Alternatively, one may pass a function of the signature:
```
def amax_compute(amax_history: Tensor) -> Tensor
```
where Tensor is a framework tensor type.

scaling_factor_compute_algo (Callable, default = None) –

Algorithm used for computing the new scaling factor based on the value of amax. It should be a function of the signature:

def scaling_factor_compute(amax: Tensor,
                           old_scaling_factor: Tensor,
                           fp8_max: Tensor,
                           recipe: DelayedScaling) -> Tensor

where Tensor is a framework tensor type.

override_linear_precision (Tuple(bool, bool, bool), default=(False, False, False)) – Whether or not the execute the fprop, dgrad, and wgrad GEMMs (respectively) in higher precision when using FP8.

Notes

By default (when scaling_factor_compute_algo is left as None) the scaling factor is computed from the final amax value using the formula:

FP8_MAX = maximum_representable_value(fp8_format)
exp = get_exponent(FP8_MAX / amax) - margin
new_scaling_factor = 2.0 ^ exp

The scaling factor should always be a power of 2 to not introduce numerical error during the conversion from FP8 to higher precision format.