Comparison Functions¶

Module: polygraphy.comparator

class OutputCompareResult(passed, max_absdiff, max_reldiff, mean_absdiff, mean_reldiff, median_absdiff, median_reldiff, quantile_absdiff, quantile_reldiff)[source]¶

Bases: object

Represents the result of comparing a single output of a single iteration between two runners.

Records the required tolerances and other statistics gathered during comparison.

Parameters:

passed (bool) – Whether the error was within acceptable limits.
max_absdiff (float) – The minimum required absolute tolerance to consider the outputs equivalent.
max_reldiff (float) – The minimum required relative tolerance to consider the outputs equivalent.
mean_absdiff (float) – The mean absolute error between the outputs.
mean_reldiff (float) – The mean relative error between the outputs.
median_absdiff (float) – The median absolute error between the outputs.
median_reldiff (float) – The median relative error between the outputs.
quantile_absdiff (float) – The q-th quantile absolute error between the outputs.
quantile_reldiff (float) – The q-th quantile relative error between the outputs.

__bool__()[source]¶

Whether the output matched.

Returns:: bool

class DistanceMetricsResult(passed, l2_norm, cosine_similarity)[source]¶

Bases: object

Represents the result of comparing a single output using distance metrics between two runners.

Records the distance metrics gathered during comparison.

Parameters:

passed (bool) – Whether the output passed all enabled metric comparisons.
l2_norm (float) – The L2 norm (Euclidean distance) between the outputs.
cosine_similarity (float) – The cosine similarity between the outputs.

__bool__()[source]¶

Whether the output passed all metric comparisons.

Returns:: bool

class QualityMetricsResult(passed, psnr=None, snr=None)[source]¶

Bases: object

Represents the result of comparing a single output using quality metrics between two runners.

Records the quality metrics gathered during comparison.

Parameters:

passed (bool) – Whether the output passed all enabled quality metric comparisons.
psnr (float) – The Peak Signal-to-Noise Ratio between the outputs. May be None if PSNR comparison was not enabled.
snr (float) – The Signal-to-Noise Ratio between the outputs. May be None if SNR comparison was not enabled.

__bool__()[source]¶

Whether the output passed all metric comparisons.

Returns:: bool

class PerceptualMetricsResult(passed, lpips=None)[source]¶

Bases: object

Represents the result of comparing a single output using perceptual metrics between two runners.

Records the perceptual metrics gathered during comparison.

Parameters:

passed (bool) – Whether the output passed all enabled perceptual metric comparisons.
lpips (float) – The Learned Perceptual Image Patch Similarity score between the outputs. Lower values indicate more perceptually similar outputs. May be None if LPIPS computation failed.

__bool__()[source]¶

Whether the output passed all metric comparisons.

Returns:: bool

class CompareFunc[source]¶

Bases: object

Provides functions that can be used to compare two IterationResult s.

static simple(check_shapes=None, rtol=None, atol=None, fail_fast=None, find_output_func=None, check_error_stat=None, infinities_compare_equal=None, save_heatmaps=None, show_heatmaps=None, save_error_metrics_plot=None, show_error_metrics_plot=None, error_quantile=None)[source]¶

Creates a function that compares two IterationResults, and can be used as the compare_func argument in Comparator.compare_accuracy.

Parameters:

check_shapes (bool) – Whether shapes must match exactly. If this is False, this function may permute or reshape outputs before comparison. Defaults to True.
rtol (Union[float, Dict[str, float]]) –
The relative tolerance to use when checking accuracy. This is expressed as a percentage of the second set of output values. For example, a value of 0.01 would check that the first set of outputs is within 1% of the second.

This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default tolerance for outputs not explicitly listed. Defaults to 1e-5.
atol (Union[float, Dict[str, float]]) – The absolute tolerance to use when checking accuracy. This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default tolerance for outputs not explicitly listed. Defaults to 1e-5.
fail_fast (bool) – Whether the function should exit immediately after the first failure. Defaults to False.
find_output_func (Callable(str, int, IterationResult) -> List[str]) – A callback that returns a list of output names to compare against from the provided IterationResult, given an output name and index from another IterationResult. The comparison function will always iterate over the output names of the first IterationResult, expecting names from the second. A return value of [] or None indicates that the output should be skipped.
check_error_stat (Union[str, Dict[str, str]]) –
The error statistic to check. Possible values are:
- ”elemwise”: Checks each element in the output to determine if it exceeds both tolerances specified.
  The minimum required tolerances displayed in this mode are only applicable when just one type of tolerance is set. Because of the nature of the check, when both absolute/relative tolerance are specified, the required minimum tolerances may be lower.
- ”max”: Checks the maximum absolute/relative errors against the respective tolerances. This is the strictest possible check.
- ”mean” Checks the mean absolute/relative errors against the respective tolerances.
- ”median”: Checks the median absolute/relative errors against the respective tolerances.
- ”quantile”: Checks the quantile absolute/relative errors against the respective tolerances.
This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default error stat for outputs not explicitly listed. Defaults to “elemwise”.
infinities_compare_equal (bool) – If True, then matching +-inf values in the output have an absdiff of 0. If False, then matching +-inf values in the output have an absdiff of NaN. Defaults to False.
save_heatmaps (str) – [EXPERIMENTAL] Path to a directory in which to save figures of heatmaps of the absolute and relative error. Defaults to None.
show_heatmaps (bool) – [EXPERIMENTAL] Whether to display heatmaps of the absolute and relative error. Defaults to False.
save_error_metrics_plot (str) – [EXPERIMENTAL] Path to a directory in which to save the error metrics plots. Defaults to None.
show_error_metrics_plot (bool) – [EXPERIMENTAL] Whether to display the error metrics plot.
error_quantile (Union[float, Dict[str, float]]) – Quantile error to compute when checking accuracy. This is expressed as a float in range [0, 1]. For example, error_quantile=0.5 is the median. Defaults to 0.99.

Returns:

A callable that returns a mapping of output names to OutputCompareResult s, indicating whether the corresponding output matched.

Return type:

Callable(IterationResult, IterationResult) -> OrderedDict[str, OutputCompareResult]

static indices(index_tolerance=None, fail_fast=None)[source]¶

Creates a function that compares two IterationResults containing indices, and can be used as the compare_func argument in Comparator.compare_accuracy. This can be useful to compare, for example, the outputs of a Top-K operation.

Outputs with more than one dimension are treated like multiple batches of values. For example, an output of shape (3, 4, 5, 10) would be treated like 60 batches (3 x 4 x 5) of 10 values each.

Parameters:

index_tolerance (Union[int, Dict[str, int]]) –
The tolerance to use when comparing indices. This is an integer indicating the maximum distance between values before it is considered a mismatch. For example, consider two outputs:
```
output0 = [0, 1, 2]
output1 = [1, 0, 2]
```
With an index tolerance of 0, this would be considered a mismatch, since the positions of 0 and 1 are flipped between the two outputs. However, with an index tolerance of 1, it would pass since the mismatched values are only 1 spot apart. If instead the outputs were:
```
output0 = [0, 1, 2]
output1 = [1, 2, 0]
```
Then we would require an index tolerance of 2, since the 0 value in the two outputs is 2 spots apart.

When this value is set, the final ‘index_tolerance’ number of values are ignored for each batch. For example, with an index tolerance of 1, mismatches in the final element are not considered. If used with a Top-K output, you can compensate for this by instead using a Top-(K + index_tolerance).

This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default tolerance for outputs not explicitly listed.
fail_fast (bool) – Whether the function should exit immediately after the first failure. Defaults to False.

Returns:

A callable that returns a mapping of output names to bool s, indicating whether the corresponding output matched.

Return type:

Callable(IterationResult, IterationResult) -> OrderedDict[str, bool]

static distance_metrics(l2_tolerance=None, cosine_similarity_threshold=None, check_shapes=None, fail_fast=None, find_output_func=None)[source]¶

Creates a function that compares two IterationResults using distance metrics (L2 norm and cosine similarity), and can be used as the compare_func argument in Comparator.compare_accuracy.

Parameters:

l2_tolerance (Union[float, Dict[str, float]]) – The tolerance to use when checking L2 norm (Euclidean distance). This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default tolerance for outputs not explicitly listed. Defaults to 1e-5.
cosine_similarity_threshold (Union[float, Dict[str, float]]) – The minimum cosine similarity required for outputs to be considered matching. Cosine similarity measures the cosine of the angle between two vectors, with values between -1 and 1. A value of 1 means vectors are identical or parallel, 0 means they are orthogonal, and -1 means they point in opposite directions. This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default threshold for outputs not explicitly listed. Defaults to 0.997 (which corresponds to a cosine distance of 0.003).
check_shapes (bool) – Whether shapes must match exactly. If this is False, this function may permute or reshape outputs before comparison. Defaults to True.
fail_fast (bool) – Whether the function should exit immediately after the first failure. Defaults to False.
find_output_func (Callable(str, int, IterationResult) -> List[str]) – A callback that returns a list of output names to compare against from the provided IterationResult, given an output name and index from another IterationResult. The comparison function will always iterate over the output names of the first IterationResult, expecting names from the second. A return value of [] or None indicates that the output should be skipped.

Returns:

A callable that returns a mapping of output names to DistanceMetricsResult s, indicating whether the corresponding output matched based on the distance metrics.

Return type:

Callable(IterationResult, IterationResult) -> OrderedDict[str, DistanceMetricsResult]

static quality_metrics(psnr_tolerance=None, snr_tolerance=None, check_shapes=None, fail_fast=None, find_output_func=None)[source]¶

Creates a function that compares two IterationResults using quality metrics (PSNR and SNR), and can be used as the compare_func argument in Comparator.compare_accuracy.

Parameters:

psnr_tolerance (Union[float, Dict[str, float]]) – The minimum PSNR (Peak Signal-to-Noise Ratio) value required for outputs to be considered matching. Higher values of PSNR indicate better quality matches. Typical acceptable values are 30 dB or above. This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default tolerance for outputs not explicitly listed. If None, PSNR check will be skipped. Defaults to 30.0.
snr_tolerance (Union[float, Dict[str, float]]) – The minimum SNR (Signal-to-Noise Ratio) value required for outputs to be considered matching. Higher values of SNR indicate better quality matches. This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default tolerance for outputs not explicitly listed. If None, SNR check will be skipped. Defaults to 20.0.
check_shapes (bool) – Whether shapes must match exactly. If this is False, this function may permute or reshape outputs before comparison. Defaults to True.
fail_fast (bool) – Whether the function should exit immediately after the first failure. Defaults to False.
find_output_func (Callable(str, int, IterationResult) -> List[str]) – A callback that returns a list of output names to compare against from the provided IterationResult, given an output name and index from another IterationResult. The comparison function will always iterate over the output names of the first IterationResult, expecting names from the second. A return value of [] or None indicates that the output should be skipped.

Returns:

A callable that returns a mapping of output names to QualityMetricsResult s, indicating whether the corresponding output matched based on the quality metrics.

Return type:

Callable(IterationResult, IterationResult) -> OrderedDict[str, QualityMetricsResult]

static perceptual_metrics(lpips_threshold=None, check_shapes=None, fail_fast=None, find_output_func=None)[source]¶

Creates a function that compares two IterationResults using perceptual metrics (LPIPS), and can be used as the compare_func argument in Comparator.compare_accuracy.

This function specifically targets image-like data and uses perceptual similarity metrics that correlate better with human perception than traditional distance metrics.

Parameters:

lpips_threshold (Union[float, Dict[str, float]]) – The maximum LPIPS (Learned Perceptual Image Patch Similarity) score allowed for outputs to be considered matching. Lower values indicate more perceptually similar outputs. Typical values are below 0.1. This can be provided on a per-output basis using a dictionary. In that case, use an empty string (“”) as the key to specify default threshold for outputs not explicitly listed. If None, a default value of 0.1 will be used.
check_shapes (bool) – Whether shapes must match exactly. If this is False, this function may permute or reshape outputs before comparison. Defaults to True.
fail_fast (bool) – Whether the function should exit immediately after the first failure. Defaults to False.
find_output_func (Callable(str, int, IterationResult) -> List[str]) – A callback that returns a list of output names to compare against from the provided IterationResult, given an output name and index from another IterationResult. The comparison function will always iterate over the output names of the first IterationResult, expecting names from the second. A return value of [] or None indicates that the output should be skipped.

Returns:

A callable that returns a mapping of output names to PerceptualMetricsResult s, indicating whether the corresponding output matched based on the perceptual metrics.

Return type:

Callable(IterationResult, IterationResult) -> OrderedDict[str, PerceptualMetricsResult]