Duplicates
- class dataeval.detectors.Duplicates
Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates
- stats
Base stats class with the flags for checking duplicates
- Type:
ImageStats(flags=ImageHash.ALL)
Example
Initialize the Duplicates class:
>>> dups = Duplicates()
- evaluate(images: Iterable[ArrayLike]) Dict[Literal['exact', 'near'], List[int]]
Returns duplicate image indices for both exact matches and near matches
- Parameters:
images (Iterable[ArrayLike], shape - (N, C, H, W)) – A set of images in an ArrayLike format
- Returns:
- exact :
List of groups of indices that are exact matches
- near :
List of groups of indices that are near matches
- Return type:
Dict[str, List[int]]
See also
ImageStatsExample
>>> dups.evaluate(images) {'exact': [[3, 20], [16, 37]], 'near': [[3, 20, 22], [12, 18], [13, 36], [14, 31], [17, 27], [19, 38, 47]]}