Reference Guide
Detectors
Detectors can determine if a dataset or individual images in a dataset are indicative of a specific issue
Data Exploration
|
Uses hierarchical clustering to flag dataset properties of interest like outliers and duplicates |
|
Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates |
|
Calculates statistical outliers of a dataset using various statistical tests applied to each image |
Data Monitoring
Drift
|
Cramér-von Mises (CVM) data drift detector, which tests for any change in the distribution of continuous univariate data. |
|
Kolmogorov-Smirnov (K-S) data drift detector with Bonferroni or False Discovery Rate (FDR) correction for multivariate data. |
|
Test for a change in the number of instances falling into regions on which the model is uncertain. |
|
Maximum Mean Discrepancy (MMD) data drift detector using a permutation test. |
|
Gaussian RBF kernel: k(x,y) = exp(-(1/(2*sigma^2)||x-y||^2). |
Updates reference dataset for drift detector using last seen method. |
|
Updates reference dataset for drift detector using reservoir sampling method. |
Out of Distribution
|
|
|
|
|
|
|
|
|
|
|
NamedTuple containing the instance and (optionally) feature score. |
Metrics
Metrics are a way to measure the performance of your models or datasets that can then be analyzed in the context of a given problem
Data Exploration
|
Calculates pixel statistics for each image per channel |
|
Calculates image and pixel statistics for each image |
Metadata/Label Exploration
|
Mutual information (MI) between factors (class label, metadata, label/image properties) |
|
Compute mutual information (analogous to correlation) between metadata factors (class label, metadata, label/image properties) with individual class labels. |
|
Class for evaluating coverage and identifying images/samples that are in undercovered regions. |
|
Calculates the divergence and any errors between the datasets |
|
Compute diversity for discrete/categorical variables and, through standard histogram binning, for continuous variables. |
|
Compute diversity for discrete/categorical variables and, through standard histogram binning, for continuous variables. |
|
Perform a one-way chi-squared test between observation frequencies and expected frequencies that tests the null hypothesis that the observed data has the expected frequencies. |
|
Evaluates the statistical independence of metadata factors from class labels. |
Data Performance
|
An estimator for Multi-class Bayes Error Rate using FR or KNN test statistic basis |
|
FR Test Statistic based estimate of the empirical mean precision for the upperbound average precision |
Flags
Flags are used by the imagestats and channelstats functions, as well as the Linter class
|
Flags for calculating image and channel statistics |
Workflows
Workflows perform a sequence of actions to analyze the dataset and make predictions
|
Project dataset sufficiency using given a model and evaluation criteria |
Supported Model Backends
The models and model trainers provided by DataEval are meant to assist users in setting up architectures that are guaranteed to work with applicable DataEval metrics. Below is a list of backends with available trainers and models.