Comparison Functions

Module: polygraphy.comparator

class OutputCompareResult(passed, max_absdiff, max_reldiff, mean_absdiff, mean_reldiff, median_absdiff, median_reldiff, quantile_absdiff, quantile_reldiff)[source]

Bases: object

Represents the result of comparing a single output of a single iteration between two runners.

Records the required tolerances and other statistics gathered during comparison.

Parameters:
  • passed (bool) – Whether the error was within acceptable limits.

  • max_absdiff (float) – The minimum required absolute tolerance to consider the outputs equivalent.

  • max_reldiff (float) – The minimum required relative tolerance to consider the outputs equivalent.

  • mean_absdiff (float) – The mean absolute error between the outputs.

  • mean_reldiff (float) – The mean relative error between the outputs.

  • median_absdiff (float) – The median absolute error between the outputs.

  • median_reldiff (float) – The median relative error between the outputs.

  • quantile_absdiff (float) – The q-th quantile absolute error between the outputs.

  • quantile_reldiff (float) – The q-th quantile relative error between the outputs.

__bool__()[source]

Whether the output matched.

Returns:

bool

class CompareFunc[source]

Bases: object

Provides functions that can be used to compare two IterationResult s.

static simple(check_shapes=None, rtol=None, atol=None, fail_fast=None, find_output_func=None, check_error_stat=None, infinities_compare_equal=None, save_heatmaps=None, show_heatmaps=None, save_error_metrics_plot=None, show_error_metrics_plot=None, error_quantile=None)[source]

Creates a function that compares two IterationResults, and can be used as the compare_func argument in Comparator.compare_accuracy.

Parameters:
  • check_shapes (bool) – Whether shapes must match exactly. If this is False, this function may permute or reshape outputs before comparison. Defaults to True.

  • rtol (Union[float, Dict[str, float]]) –

    The relative tolerance to use when checking accuracy. This is expressed as a percentage of the second set of output values. For example, a value of 0.01 would check that the first set of outputs is within 1% of the second.

    This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default tolerance for outputs not explicitly listed. Defaults to 1e-5.

  • atol (Union[float, Dict[str, float]]) – The absolute tolerance to use when checking accuracy. This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default tolerance for outputs not explicitly listed. Defaults to 1e-5.

  • fail_fast (bool) – Whether the function should exit immediately after the first failure. Defaults to False.

  • find_output_func (Callable(str, int, IterationResult) -> List[str]) – A callback that returns a list of output names to compare against from the provided IterationResult, given an output name and index from another IterationResult. The comparison function will always iterate over the output names of the first IterationResult, expecting names from the second. A return value of [] or None indicates that the output should be skipped.

  • check_error_stat (Union[str, Dict[str, str]]) –

    The error statistic to check. Possible values are:

    • ”elemwise”: Checks each element in the output to determine if it exceeds both tolerances specified.

      The minimum required tolerances displayed in this mode are only applicable when just one type of tolerance is set. Because of the nature of the check, when both absolute/relative tolerance are specified, the required minimum tolerances may be lower.

    • ”max”: Checks the maximum absolute/relative errors against the respective tolerances. This is the strictest possible check.

    • ”mean” Checks the mean absolute/relative errors against the respective tolerances.

    • ”median”: Checks the median absolute/relative errors against the respective tolerances.

    • ”quantile”: Checks the quantile absolute/relative errors against the respective tolerances.

    This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default error stat for outputs not explicitly listed. Defaults to “elemwise”.

  • infinities_compare_equal (bool) – If True, then matching +-inf values in the output have an absdiff of 0. If False, then matching +-inf values in the output have an absdiff of NaN. Defaults to False.

  • save_heatmaps (str) – [EXPERIMENTAL] Path to a directory in which to save figures of heatmaps of the absolute and relative error. Defaults to None.

  • show_heatmaps (bool) – [EXPERIMENTAL] Whether to display heatmaps of the absolute and relative error. Defaults to False.

  • save_error_metrics_plot (str) – [EXPERIMENTAL] Path to a directory in which to save the error metrics plots. Defaults to None.

  • show_error_metrics_plot (bool) – [EXPERIMENTAL] Whether to display the error metrics plot.

  • error_quantile (Union[float, Dict[str, float]]) – Quantile error to compute when checking accuracy. This is expressed as a float in range [0, 1]. For example, error_quantile=0.5 is the median. Defaults to 0.99.

Returns:

A callable that returns a mapping of output names to OutputCompareResult s, indicating whether the corresponding output matched.

Return type:

Callable(IterationResult, IterationResult) -> OrderedDict[str, OutputCompareResult]

static indices(index_tolerance=None, fail_fast=None)[source]

Creates a function that compares two IterationResults containing indices, and can be used as the compare_func argument in Comparator.compare_accuracy. This can be useful to compare, for example, the outputs of a Top-K operation.

Outputs with more than one dimension are treated like multiple batches of values. For example, an output of shape (3, 4, 5, 10) would be treated like 60 batches (3 x 4 x 5) of 10 values each.

Parameters:
  • index_tolerance (Union[int, Dict[str, int]]) –

    The tolerance to use when comparing indices. This is an integer indicating the maximum distance between values before it is considered a mismatch. For example, consider two outputs:

    output0 = [0, 1, 2]
    output1 = [1, 0, 2]
    

    With an index tolerance of 0, this would be considered a mismatch, since the positions of 0 and 1 are flipped between the two outputs. However, with an index tolerance of 1, it would pass since the mismatched values are only 1 spot apart. If instead the outputs were:

    output0 = [0, 1, 2]
    output1 = [1, 2, 0]
    

    Then we would require an index tolerance of 2, since the 0 value in the two outputs is 2 spots apart.

    When this value is set, the final ‘index_tolerance’ number of values are ignored for each batch. For example, with an index tolerance of 1, mismatches in the final element are not considered. If used with a Top-K output, you can compensate for this by instead using a Top-(K + index_tolerance).

    This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default tolerance for outputs not explicitly listed.

  • fail_fast (bool) – Whether the function should exit immediately after the first failure. Defaults to False.

Returns:

A callable that returns a mapping of output names to bool s, indicating whether the corresponding output matched.

Return type:

Callable(IterationResult, IterationResult) -> OrderedDict[str, bool]