Duplicates#
- class dataeval.detectors.linters.Duplicates(only_exact: bool = False)#
Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates
- stats#
Output class of stats
- Type:
StatsOutput
- Parameters:
only_exact (bool, default False) – Only inspect the dataset for exact image matches
Example
Initialize the Duplicates class:
>>> all_dupes = Duplicates() >>> exact_dupes = Duplicates(only_exact=True)
- evaluate(data: Iterable[_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]]) DuplicatesOutput#
Returns duplicate image indices for both exact matches and near matches
- Parameters:
data (Iterable[ArrayLike], shape - (N, C, H, W) | StatsOutput | Sequence[StatsOutput]) – A dataset of images in an ArrayLike format or the output(s) from a hashstats analysis
- Returns:
List of groups of indices that are exact and near matches
- Return type:
DuplicatesOutput
See also
hashstatsExample
>>> all_dupes.evaluate(images) DuplicatesOutput(exact=[[3, 20], [16, 37]], near=[[3, 20, 22], [12, 18], [13, 36], [14, 31], [17, 27], [19, 38, 47]])
- from_stats(hashes: HashStatsOutput | Sequence[HashStatsOutput]) DuplicatesOutput#
Returns duplicate image indices for both exact matches and near matches
- Parameters:
data (HashStatsOutput | Sequence[HashStatsOutput]) – The output(s) from a hashstats analysis
- Returns:
List of groups of indices that are exact and near matches
- Return type:
DuplicatesOutput
See also
hashstatsExample
>>> exact_dupes.from_stats([hashes1, hashes2]) DuplicatesOutput(exact=[{0: [3, 20]}, {0: [16], 1: [12]}], near=[])