Duplicates

class dataeval.detectors.Duplicates(find_exact: bool = True, find_near: bool = True)

Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates

stats

Output class of stats

Type:

StatsOutput

Example

Initialize the Duplicates class:

>>> dups = Duplicates()
evaluate(images: Iterable[_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]]) DuplicatesOutput

Returns duplicate image indices for both exact matches and near matches

Parameters:

images (Iterable[ArrayLike], shape - (N, C, H, W)) – A set of images in an ArrayLike format

Returns:

List of groups of indices that are exact and near matches

Return type:

DuplicatesOutput

See also

imagestats

Example

>>> dups.evaluate(images)
DuplicatesOutput(exact=[[3, 20], [16, 37]], near=[[3, 20, 22], [12, 18], [13, 36], [14, 31], [17, 27], [19, 38, 47]])