dataeval.metrics.stats.hashstats

dataeval.metrics.stats.hashstats(dataset, *, per_box=False)

Calculates hashes for each image.

This function computes hashes from the images including exact hashes and perception-based hashes. These hash values can be used to determine if images are exact or near matches.

Parameters:
dataset : Dataset

Dataset to perform calculations on.

per_box : bool, default False

If True, perform calculations on each bounding box.

Returns:

A dictionary-like object containing the computed hashes for each image.

Return type:

HashStatsOutput

See also

Duplicates

Examples

Calculate the hashes of a dataset of images, whose shape is (C, H, W)

>>> results = hashstats(dataset)
>>> print(results.xxhash[:5])
['69b50a5f06af238c', '5a861d7a23d1afe7', '7ffdb4990ad44ac6', '4f0c366a3298ceac', 'c5519e36ac1f8839']
>>> print(results.pchash[:5])
['e666999999266666', 'e666999999266666', 'e666999966666299', 'e666999999266666', '96e91656e91616e9']