dataeval.quality.Duplicates.Config¶
- class dataeval.quality.Duplicates.Config¶
Configuration for Duplicates detector.
- flags¶
Statistics to compute for hash-based duplicate detection.
- Type:¶
ImageStats, default ImageStats.HASH_DUPLICATES_BASIC
- cluster_sensitivity¶
Distance factor for cluster-based near duplicate detection. Scales the cluster’s standard deviation to set the duplicate cutoff. Must be provided together with extractor to enable clustering.
- Type:¶
float or None, default None
- merge_near_duplicates¶
Whether to merge overlapping near duplicate groups.
- Type:¶
bool, default True
- extractor¶
Feature extractor for cluster-based duplicate detection.
- Type:¶
FeatureExtractor or None, default None
- batch_size¶
Batch size for feature extraction during cluster-based detection. If None, uses DataEval default. Must be set by either parameter or global default if extractor is provided.
- Type:¶
int or None, default None