Functional Overview¶

The following tables summarize the advised use cases and technical requirements for the algorithms provided by the DataEval library. Each algorithm targets different types of data or problem domains. Refer to the method-specific pages by clicking the algorithms for more detailed information.

Computer Vision Task Compatibility¶

The following tables show the compatible computer vision tasks that have support in DataEval. The tables are split into categories based on usage and follow DataEval’s public API.

Metrics

Algorithm	Description	Image Classification	Object Detection	Unsupervised
`Bayes error rate (KNN)` `Bayes error rate (MST)`	Determines feasibility of image classification by estimating the bayes error rate	✔
`Box to Image ratio statistics`	Computes statistical summaries of target boxes to image ratios		✔
`Completeness`	Measures the degree to which images span the learned embedding space	✔	✔	✔
`Coverage (Adaptive)` `Coverage (Naive)`	Measures how well the distribution of images in a dataset covers the input space	✔	✔	✔
`Divergence (FNN)` `Divergence (MST)`	Measures the difference between dataset distributions	✔	✔	✔
`Feature distance`	Measures the feature-wise distance between two continuous distributions	✔	✔	✔
`Image and Target statistics`	Computes statistical summaries of images and/or targets in a dataset	✔	✔	✔
`Label errors`	Computes potential label errors in a dataset using embeddings	✔	✔
`Label parity`	Assesses equivalence in label frequency between datasets	✔	✔
`Label stats`	Computes statistical summaries of labels in a dataset	✔	✔
`Null model metrics`	Calculates performance metrics for random classifiers on training and testing labels based on the class distributions	✔	✔
`Parity`	Detects if there is a significant relationship between the factor values and class labels	✔	✔
`UAP`	Determines feasibility of an object detection task by estimating upper bound on average precision		✔

Evaluators

Algorithm	Description	Image Classification	Object Detection	Unsupervised
`Balance`	Assesses the mutual information between factors	✔	✔
`Diversity`	Measures the distribution of metadata factors in the dataset	✔	✔
`Drift Domain Classifier` `Drift K-Nearest Neighbors` `Drift MMD` `Drift Reconstruction` `Drift Univariate`	Detects data distribution shifts from training data	✔	✔	✔
`Duplicate Detection`	Identifies duplicate data entries	✔	✔	✔
`Out-of-Distribution Domain Classifier` `Out-of-Distribution K-Nearest Neighbors` `Out-of-Distribution Reconstruction`	Detects data points that fall outside the training distribution	✔	✔	✔
`Outliers`	Identifies anomalous data points based on deviations from mean	✔	✔	✔
`Prioritization`	Orders samples based on embeddings	✔	✔	✔

Metadata

Algorithm	Description	Image Classification	Object Detection	Unsupervised
`Factor Deviation`	Computes greatest deviation in metadata features per sample	✔	✔	✔
`Factor Predictors`	Measures the most impactful metadata factors correlated with a flagged sample	✔	✔	✔

Workflows

Algorithm	Description	Image Classification	Object Detection	Unsupervised
`Sufficiency`	Determines data needs for performance standards	✔	✔

Data Selection

Algorithm	Description	Image Classification	Object Detection	Unsupervised
`Dataset Splitter`	Generates train, val, and test splits based on information such as labels and metadata	✔	✔	✔
`Select`	A set of dataset filters that enable rapid development of various datasets	✔	✔	✔

Input Requirements¶

The following table shows the input parameters used by each of DataEval’s core functionalities.

Note

DataEval imposes no restrictions on image type. It accepts any image modality (RGB, IR, EO, multispectral, greyscale, and others) at any bit depth (8-bit, 16-bit, 32-bit, etc.) and channel count (1+).

Metrics