Functional Overview

The following tables summarize the advised use cases and technical requirements for the algorithms provided by the DataEval library. Each algorithm targets different types of data or problem domains. Refer to the method-specific pages by clicking the algorithms for more detailed information.

Computer Vision Task Compatibility

The following tables show the compatible computer vision tasks that have support in DataEval. The tables are split into categories based on usage and follow DataEval’s public API.

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Balance

Assesses the mutual information between factors

Bayes error rate

Determines feasibility of image classification by estimating the bayes error rate

Box statistics

Computes statistical summaries of target boxes

Completeness

Measures the degree to which images span the learned embedding space

Coverage

Measures how well the distribution of images in a dataset covers the input space

Dimension stats

Computes statistical summaries of image and target box dimensions

Divergence

Measures the difference between dataset distributions

Diversity

Measures the distribution of metadata factors in the dataset

Image statistics

Computes statistical summaries of images in a dataset

Label parity

Assesses equivalence in label frequency between datasets

Label stats

Computes statistical summaries of labels in a dataset

Null model metrics

Calculates performance metrics for random classifiers on training and testing labels based on the class distributions

Parity

Detects if there is a significant relationship between the factor values and class labels

UAP

Determines feasibility of an object detection task by estimating upper bound on average precision

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Drift

Detects data distribution shifts from training data

Duplicate Detection

Identifies duplicate data entries

Out-of-Distribution

Detects data points that fall outside the training distribution

Outliers

Identifies anomalous data points based on deviations from mean

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Most Deviated Factors

Measures the greatest deviated metadata factors for detected out of distribution samples

OOD Predictors

Measures the most impactful factors for detected out of distribution samples

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Sufficiency

Determines data needs for performance standards

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Dataset Splitter

Generates train, val, and test splits based on information such as labels and metadata

Select

A set of dataset filters that enable rapid development of various datasets

Input Requirements

The following table shows the input parameters used by each of DataEval’s core functionalities.

For more information on a specific algorithm, click the name in the table.
For an overview, see the metrics page.

Algorithm

Images

Labels

Bounding Boxes

Metadata

Scores

Balance

Required

Required

Bayes error rate

Required1

Required

Box statistics

Required

Required

Completeness

Required1

Coverage

Required1

Dimension stats

Required2

Divergence

Required1

Diversity

Required

Required

Image statistics

Required2

Label parity

Required

Label stats

Required

Null model metrics

Required

Parity

Required

Required

UAP

Required

Required4

For more information on a specific algorithm, click the name in the table.
For an overview, see the detectors page.

Algorithm

Images

Labels

Bounding Boxes

Metadata

Scores

Drift

Required

Duplicate Detection

Required2

Out-of-Distribution

Required

Outliers

Required

For more information on a specific algorithm, click the name in the table.
For an overview, see the metadata page.

Algorithm

Images

Labels

Bounding Boxes

Metadata3

Scores5

Most Deviated Factors

Required

Required

OOD Predictors

Required

Required

For more information on a specific algorithm, click the name in the table.
For an overview, see the workflows page.

Algorithm

Images

Labels

Bounding Boxes

Metadata

Scores

Model

Sufficiency2

Required

Required

OD Only

Task specific

For more information on a specific algorithm, click the name in the table.
For an overview, see the data page.

Algorithm

Images

Labels

Bounding Boxes

Metadata

Scores

Model

Dataset Splitter2

Optional

Optional3

Select2

Optional

Optional

Optional

Optional

Note

1 It is highly recommended to give embeddings over raw images using Embeddings.
2 Input data must be wrapped together in a Dataset.
3 When using only metadata, it must be wrapped in DataEval’s Metadata class.
4 These scores are the raw outputs of a model.
5 These scores are retrieved by DataEval’s Out Of Distribution functions.