Reference Guide

Detectors

Detectors can determine if a dataset or individual images in a dataset are indicative of a specific issue

Data Exploration

`detectors.Clusterer`(dataset)	Uses hierarchical clustering to flag dataset properties of interest like outliers and duplicates
`detectors.Duplicates`()	Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates
`detectors.Linter`([flags, outlier_method, ...])	Calculates statistical outliers of a dataset using various statistical tests applied to each image

Data Monitoring

Drift

`detectors.DriftCVM`(x_ref[, p_val, ...])	Cramér-von Mises (CVM) data drift detector, which tests for any change in the distribution of continuous univariate data.
`detectors.DriftKS`(x_ref[, p_val, ...])	Kolmogorov-Smirnov (K-S) data drift detector with Bonferroni or False Discovery Rate (FDR) correction for multivariate data.
`detectors.DriftUncertainty`(x_ref, model[, ...])	Test for a change in the number of instances falling into regions on which the model is uncertain.
`detectors.DriftMMD`(x_ref, p_val, ...)	Maximum Mean Discrepancy (MMD) data drift detector using a permutation test.
`detectors.GaussianRBF`([sigma, ...])	Gaussian RBF kernel: k(x,y) = exp(-(1/(2*sigma^2)\|\|x-y\|\|^2).
`detectors.LastSeenUpdate`(n)	Updates reference dataset for drift detector using last seen method.
`detectors.ReservoirSamplingUpdate`(n)	Updates reference dataset for drift detector using reservoir sampling method.

Out of Distribution

`detectors.OOD_AE`(model)
`detectors.OOD_AEGMM`(model)
`detectors.OOD_LLR`(model[, model_background, ...])
`detectors.OOD_VAE`(model[, samples])
`detectors.OOD_VAEGMM`(model[, samples])
`detectors.OODScore`(instance_score[, ...])	NamedTuple containing the instance and (optionally) feature score.

Metrics

Metrics are a way to measure the performance of your models or datasets that can then be analyzed in the context of a given problem

Data Exploration

`metrics.ChannelStats`([flags])
`metrics.ImageStats`([flags])	Calculates various image property statistics

Metadata/Label Exploration

`metrics.Coverage`([radius_type, k, percent])	Class for evaluating coverage and identifying images/samples that are in undercovered regions.
`metrics.Divergence`([method])	Calculates the estimated HP divergence between two datasets
`metrics.Parity`()	Class for evaluating statistics of observed and expected class labels, including:

Data Performance

`metrics.BER`([method, k])	An estimator for Multi-class Bayes Error Rate using FR or KNN test statistic basis
`metrics.UAP`()	FR Test Statistic based estimate of the empirical mean precision

Flags

Flags are used by the ImageStats, ChannelStats, Duplicates and Linter classes

`flags.ImageHash`(value[, names, module, ...])
`flags.ImageProperty`(value[, names, module, ...])
`flags.ImageStatistics`(value[, names, ...])
`flags.ImageVisuals`(value[, names, module, ...])

Workflows

Workflows perform a sequence of actions to analyze the dataset and make predictions

workflows.Sufficiency(model, train_ds, ...)

Project dataset sufficiency using given a model and evaluation criteria

Supported Model Backends

The models and model trainers provided by DataEval are meant to assist users in setting up architectures that are guaranteed to work with applicable DataEval metrics. Below is a list of backends with available trainers and models.