Functional Overview

The following tables summarize the advised use cases and technical requirements for the algorithms provided by the DataEval library. Each algorithm targets different types of data or problem domains. Refer to the method-specific pages by clicking the algorithms for more detailed information.

Computer Vision Task Compatibility

The following tables show the compatible computer vision tasks that have support in DataEval. The tables are split into categories based on usage and follow DataEval’s public API.

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Bayes error rate (KNN)
Bayes error rate (MST)

Determines feasibility of image classification by estimating the bayes error rate

Box to Image ratio statistics

Computes statistical summaries of target boxes to image ratios

Completeness

Measures the degree to which images span the learned embedding space

Coverage (Adaptive)
Coverage (Naive)

Measures how well the distribution of images in a dataset covers the input space

Divergence (FNN)
Divergence (MST)

Measures the difference between dataset distributions

Feature distance

Measures the feature-wise distance between two continuous distributions

Image and Target statistics

Computes statistical summaries of images and/or targets in a dataset

Label errors

Computes potential label errors in a dataset using embeddings

Label parity

Assesses equivalence in label frequency between datasets

Label stats

Computes statistical summaries of labels in a dataset

Null model metrics

Calculates performance metrics for random classifiers on training and testing labels based on the class distributions

Parity

Detects if there is a significant relationship between the factor values and class labels

UAP

Determines feasibility of an object detection task by estimating upper bound on average precision

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Balance

Assesses the mutual information between factors

Diversity

Measures the distribution of metadata factors in the dataset

Drift Domain Classifier
Drift K-Nearest Neighbors
Drift MMD
Drift Reconstruction
Drift Univariate

Detects data distribution shifts from training data

Duplicate Detection

Identifies duplicate data entries

Out-of-Distribution Domain Classifier
Out-of-Distribution K-Nearest Neighbors
Out-of-Distribution Reconstruction

Detects data points that fall outside the training distribution

Outliers

Identifies anomalous data points based on deviations from mean

Prioritization

Orders samples based on embeddings

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Factor Deviation

Computes greatest deviation in metadata features per sample

Factor Predictors

Measures the most impactful metadata factors correlated with a flagged sample

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Sufficiency

Determines data needs for performance standards

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Dataset Splitter

Generates train, val, and test splits based on information such as labels and metadata

Select

A set of dataset filters that enable rapid development of various datasets

Input Requirements

The following table shows the input parameters used by each of DataEval’s core functionalities.

Note

DataEval imposes no restrictions on image type. It accepts any image modality (RGB, IR, EO, multispectral, greyscale, and others) at any bit depth (8-bit, 16-bit, 32-bit, etc.) and channel count (1+).

For more information on a specific algorithm, click the name in the table.

Algorithm

Images

Labels

Bounding Boxes

Metadata

Scores

Model/Extractor

Bayes error rate (KNN)
Bayes error rate (MST)

Required1

Required

Box to Image Ratio statistics

Required2

Required

Completeness

Required1

Coverage (Adaptive)
Coverage (Naive)

Required1

Divergence (FNN)
Divergence (MST)

Required1

Feature distance

Required1

Image and Target statistics

Required2

Label errors

Required1

Required

Label parity

Required

Label stats

Required

Null model metrics

Required

Parity

Required

Required

UAP

Required

Required4

For more information on a specific algorithm, click the name in the table.

Algorithm

Images

Labels

Bounding Boxes

Metadata

Scores

Model/Extractor

Balance

Required

Required

Diversity

Required

Required

Drift Domain Classifier
Drift K-Nearest Neighbors
Drift MMD
Drift Reconstruction
Drift Univariate

Required

Required (Reconstruction)
Optional (Univariate)

Duplicate Detection

Required2

Optional

Out-of-Distribution Domain Classifier
Out-of-Distribution K-Nearest Neighbors
Out-of-Distribution Reconstruction

Required

Required (Reconstruction)

Outliers

Required

Optional

Prioritization

Required

Optional

For more information on a specific algorithm, click the name in the table.

Algorithm

Images

Labels

Bounding Boxes

Metadata3

Scores5

Model/Extractor

Factor Deviation

Required

Required

Factor Predictors

Required

Required

For more information on a specific algorithm, click the name in the table.

Algorithm

Images

Labels

Bounding Boxes

Metadata

Scores

Model/Extractor

Sufficiency2

Required

Required

OD Only

Task specific

For more information on a specific algorithm, click the name in the table.

Algorithm

Images

Labels

Bounding Boxes

Metadata

Scores

Model/Extractor

Dataset Splitter2

Optional

Optional3

Select2

Optional

Optional

Optional

Note

1 It is highly recommended to give embeddings over raw images using Embeddings.
2 Input data must be wrapped together in a Dataset.
3 When using only metadata, it must be wrapped in DataEval’s Metadata class.
4 These scores are the raw outputs of a model.
5 These scores are retrieved by DataEval’s Out Of Distribution (OOD) functions.