dataeval.detectors.linters

Linters help identify potential issues in training and test data and are an important aspect of data cleaning.

Classes

Clusterer

Uses hierarchical clustering to flag dataset properties of interest like outliers and duplicates.

Duplicates

Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates.

Outliers

Calculates statistical outliers of a dataset using various statistical tests applied to each image.

Output Classes

ClustererOutput

Output class for Clusterer lint detector.

DuplicatesOutput

Output class for Duplicates lint detector.

OutliersOutput

Output class for Outliers lint detector.