dataeval.detectors.linters.Duplicates¶
-
class dataeval.detectors.linters.Duplicates(only_exact=
False)¶ Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates.
- evaluate(data)¶
Returns duplicate image indices for both exact matches and near matches
See also
hashstatsExample
>>> all_dupes = Duplicates() >>> all_dupes.evaluate(duplicate_images) DuplicatesOutput(exact=[[3, 20], [16, 37]], near=[[3, 20, 22], [12, 18], [13, 36], [14, 31], [17, 27], [19, 38, 47]])
- from_stats(hashes: dataeval.outputs.HashStatsOutput) dataeval.outputs.DuplicatesOutput[dataeval.outputs._linters.DuplicateGroup]¶
- from_stats(hashes: collections.abc.Sequence[dataeval.outputs.HashStatsOutput]) dataeval.outputs.DuplicatesOutput[dataeval.outputs._linters.DatasetDuplicateGroupMap]
Returns duplicate image indices for both exact matches and near matches
- Parameters:¶
- hashes : HashStatsOutput | Sequence[HashStatsOutput]¶
The output(s) from a hashstats analysis
- Returns:¶
List of groups of indices that are exact and near matches
- Return type:¶
See also
hashstatsExample
>>> exact_dupes = Duplicates(only_exact=True) >>> exact_dupes.from_stats([hashes1, hashes2]) DuplicatesOutput(exact=[{0: [3, 20]}, {0: [16], 1: [12]}], near=[])