linters#

Linters help identify potential issues in training and test data and are an important aspect of data cleaning.

Detector Classes#

Clusterer(dataset)

Uses hierarchical clustering to flag dataset properties of interest like outliers and duplicates

Duplicates([only_exact])

Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates

Outliers([use_dimension, use_pixel, ...])

Calculates statistical outliers of a dataset using various statistical tests applied to each image

Output Classes#

ClustererOutput(outliers, ...)

Output class for Clusterer lint detector

DuplicatesOutput(exact, near)

Output class for Duplicates lint detector

OutliersOutput(issues)

Output class for Outliers lint detector