Reference Guide
Detectors
Detectors can determine if a dataset or individual images in a dataset are indicative of a specific issue
Data Exploration
|
Uses hierarchical clustering to flag dataset properties of interest like outliers and duplicates |
Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates |
|
|
Calculates statistical outliers of a dataset using various statistical tests applied to each image |
Data Monitoring
Drift
|
Cramér-von Mises (CVM) data drift detector, which tests for any change in the distribution of continuous univariate data. |
|
Kolmogorov-Smirnov (K-S) data drift detector with Bonferroni or False Discovery Rate (FDR) correction for multivariate data. |
|
Test for a change in the number of instances falling into regions on which the model is uncertain. |
|
Maximum Mean Discrepancy (MMD) data drift detector using a permutation test. |
|
Gaussian RBF kernel: k(x,y) = exp(-(1/(2*sigma^2)||x-y||^2). |
Updates reference dataset for drift detector using last seen method. |
|
Updates reference dataset for drift detector using reservoir sampling method. |
Out of Distribution
|
|
|
|
|
|
|
|
|
|
|
NamedTuple containing the instance and (optionally) feature score. |
Metrics
Metrics are a way to measure the performance of your models or datasets that can then be analyzed in the context of a given problem
Data Exploration
|
|
|
Calculates various image property statistics |
Metadata/Label Exploration
|
Class for evaluating coverage and identifying images/samples that are in undercovered regions. |
|
Calculates the estimated HP divergence between two datasets |
Class for evaluating statistics of observed and expected class labels, including: |
Data Performance
|
An estimator for Multi-class Bayes Error Rate using FR or KNN test statistic basis |
FR Test Statistic based estimate of the empirical mean precision |
Flags
Flags are used by the ImageStats, ChannelStats, Duplicates and Linter classes
|
|
|
|
|
|
|
Workflows
Workflows perform a sequence of actions to analyze the dataset and make predictions
|
Project dataset sufficiency using given a model and evaluation criteria |
Supported Model Backends
The models and model trainers provided by DataEval are meant to assist users in setting up architectures that are guaranteed to work with applicable DataEval metrics. Below is a list of backends with available trainers and models.