dataeval.detectors.linters.Duplicates ===================================== .. py:class:: dataeval.detectors.linters.Duplicates(only_exact = False) Finds the duplicate images in a dataset using xxhash for exact :term:`duplicates` and pchash for near duplicates .. attribute:: stats Output class of stats :type: StatsOutput :param only_exact: Only inspect the dataset for exact image matches :type only_exact: bool, default False .. py:method:: evaluate(data) Returns duplicate image indices for both exact matches and near matches :param data: A dataset of images in an ArrayLike format or the output(s) from a hashstats analysis :type data: Iterable[ArrayLike], shape - (N, C, H, W) | StatsOutput | Sequence[StatsOutput] :returns: List of groups of indices that are exact and near matches :rtype: DuplicatesOutput .. seealso:: :obj:`hashstats` .. rubric:: Example >>> all_dupes = Duplicates() >>> all_dupes.evaluate(duplicate_images) DuplicatesOutput(exact=[[3, 20], [16, 37]], near=[[3, 20, 22], [12, 18], [13, 36], [14, 31], [17, 27], [19, 38, 47]]) .. py:method:: from_stats(hashes: dataeval.metrics.stats.hashstats.HashStatsOutput) -> DuplicatesOutput[DuplicateGroup] from_stats(hashes: Sequence[dataeval.metrics.stats.hashstats.HashStatsOutput]) -> DuplicatesOutput[DatasetDuplicateGroupMap] Returns duplicate image indices for both exact matches and near matches :param data: The output(s) from a hashstats analysis :type data: HashStatsOutput | Sequence[HashStatsOutput] :returns: List of groups of indices that are exact and near matches :rtype: DuplicatesOutput .. seealso:: :obj:`hashstats` .. rubric:: Example >>> exact_dupes = Duplicates(only_exact=True) >>> exact_dupes.from_stats([hashes1, hashes2]) DuplicatesOutput(exact=[{0: [3, 20]}, {0: [16], 1: [12]}], near=[])