Algorithm Summary¶

The following tables summarize the advised use cases and technical requirements for the algorithms provided by the DataEval library. Each algorithm targets different types of data or problem domains. Refer to the method-specific pages for more detailed information.

DataEval Algorithms¶

Algorithm	Description	Image Classification	Object Detection	Unsupervised
Balance	Assesses the metadata distribution across classes	✔	✔
BER	Determines feasibility by estimating the error rate	✔
Clusterer	Groups data to detect outliers and duplicates	✔	✔	✔
Coverage	Measures how well the dataset covers the input space	✔	✔	✔
Divergence	Detects differences between dataset distributions	✔	✔
Diversity	Assesses the spread of metadata factors	✔	✔
Drift	Detects data distribution shifts from training data	✔	✔
Duplicates	Identifies duplicate data entries	✔	✔	✔
ImageStats	Computes statistical summaries of datasets	✔	✔	✔
Label Parity	Detects differences between label distributions	✔	✔
Out-of-Distribution	Detects data points that fall outside training distribution	✔	✔
Outliers	Identifies anomalous data points based on deviations from mean	✔	✔	✔
Parity	Detects differences between metadata distributions	✔	✔
Sufficiency	Determines data needs for performance standards	✔	✔
UAP	Determines feasibility by estimating upper bound on average precision		✔

Algorithm Requirements¶

A red checkmark means the algorithm accepts multiple data types.

Algorithm	Images	Labels	Bounding Boxes	Metadata	Scores
Balance		✔		✔
BER	✔	✔
Clusterer	✔
Coverage	✔
Divergence	✔
Diversity		✔		✔
Drift	✔
Duplicates	✔
Image Stats	✔	✔	✔	✔
Label Parity		✔
Out-of-Distribution	✔
Outliers	✔		✔
Parity		✔		✔
Sufficiency	✔	✔
UAP		✔			✔