Algorithm Summary¶
The following tables summarize the advised use cases and technical requirements for the algorithms provided by the DataEval library. Each algorithm targets different types of data or problem domains. Refer to the method-specific pages for more detailed information.
DataEval Algorithms¶
Algorithm |
Description |
Image Classification |
Object Detection |
Unsupervised |
|---|---|---|---|---|
Assesses the metadata distribution across classes |
✔ |
✔ |
||
Determines feasibility by estimating the error rate |
✔ |
|||
Groups data to detect outliers and duplicates |
✔ |
✔ |
✔ |
|
Measures how well the dataset covers the input space |
✔ |
✔ |
✔ |
|
Detects differences between dataset distributions |
✔ |
✔ |
||
Assesses the spread of metadata factors |
✔ |
✔ |
||
Detects data distribution shifts from training data |
✔ |
✔ |
||
Identifies duplicate data entries |
✔ |
✔ |
✔ |
|
Computes statistical summaries of datasets |
✔ |
✔ |
✔ |
|
Detects differences between label distributions |
✔ |
✔ |
||
Detects data points that fall outside training distribution |
✔ |
✔ |
||
Identifies anomalous data points based on deviations from mean |
✔ |
✔ |
✔ |
|
Detects differences between metadata distributions |
✔ |
✔ |
||
Determines data needs for performance standards |
✔ |
✔ |
||
Determines feasibility by estimating upper bound on average precision |
✔ |
Algorithm Requirements¶
A red checkmark means the algorithm accepts multiple data types.
Algorithm |
Images |
Labels |
Bounding Boxes |
Metadata |
Scores |
|---|---|---|---|---|---|
✔ |
✔ |
||||
✔ |
✔ |
||||
✔ |
|||||
✔ |
|||||
✔ |
|||||
✔ |
✔ |
||||
✔ |
|||||
✔ |
|||||
✔ |
✔ |
✔ |
✔ |
||
✔ |
|||||
✔ |
|||||
✔ |
✔ |
||||
✔ |
✔ |
||||
✔ |
✔ |
||||
✔ |
✔ |