Algorithm Summary

The following tables summarize the advised use cases and technical requirements for the algorithms provided by the DataEval library. Each algorithm targets different types of data or problem domains. Refer to the method-specific pages for more detailed information.

DataEval Algorithms

Algorithm

Description

Image Classification

Object Detection

Unsupervised

Balance

Assesses the metadata distribution across classes

BER

Determines feasibility by estimating the error rate

Clusterer

Groups data to detect outliers and duplicates

Coverage

Measures how well the dataset covers the input space

Divergence

Detects differences between dataset distributions

Diversity

Assesses the spread of metadata factors

Drift

Detects data distribution shifts from training data

Duplicates

Identifies duplicate data entries

ImageStats

Computes statistical summaries of datasets

Label Parity

Detects differences between label distributions

Out-of-Distribution

Detects data points that fall outside training distribution

Outliers

Identifies anomalous data points based on deviations from mean

Parity

Detects differences between metadata distributions

Sufficiency

Determines data needs for performance standards

UAP

Determines feasibility by estimating upper bound on average precision

Algorithm Requirements

A red checkmark means the algorithm accepts multiple data types.

Algorithm

Images

Labels

Bounding Boxes

Metadata

Scores

Balance

BER

Clusterer

Coverage

Divergence

Diversity

Drift

Duplicates

Image Stats

Label Parity

Out-of-Distribution

Outliers

Parity

Sufficiency

UAP