Duplicates

class dataeval.detectors.Duplicates

Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates

stats

Base stats class with the flags for checking duplicates

Type:

ImageStats(flags=ImageHash.ALL)

Example

Initialize the Duplicates class:

>>> dups = Duplicates()
evaluate(images: Iterable[ArrayLike]) Dict[Literal['exact', 'near'], List[int]]

Returns duplicate image indices for both exact matches and near matches

Parameters:

images (Iterable[ArrayLike], shape - (N, C, H, W)) – A set of images in an ArrayLike format

Returns:

exact :

List of groups of indices that are exact matches

near :

List of groups of indices that are near matches

Return type:

Dict[str, List[int]]

See also

ImageStats

Example

>>> dups.evaluate(images)
{'exact': [[3, 20], [16, 37]], 'near': [[3, 20, 22], [12, 18], [13, 36], [14, 31], [17, 27], [19, 38, 47]]}