Functional Overview¶
The following tables summarize the advised use cases and technical requirements for the algorithms provided by the DataEval library. Each algorithm targets different types of data or problem domains. Refer to the method-specific pages by clicking the algorithms for more detailed information.
Computer Vision Task Compatibility¶
The following tables show the compatible computer vision tasks that have support in DataEval. The tables are split into categories based on usage and follow DataEval’s public API.
Algorithm |
Description |
Image Classification |
Object Detection |
Unsupervised |
|---|---|---|---|---|
Determines feasibility of image classification by estimating the bayes error rate |
✔ |
|||
Computes statistical summaries of target boxes to image ratios |
✔ |
|||
Measures the degree to which images span the learned embedding space |
✔ |
✔ |
✔ |
|
Measures how well the distribution of images in a dataset covers the input space |
✔ |
✔ |
✔ |
|
Measures the difference between dataset distributions |
✔ |
✔ |
✔ |
|
Measures the feature-wise distance between two continuous distributions |
✔ |
✔ |
✔ |
|
Computes statistical summaries of images and/or targets in a dataset |
✔ |
✔ |
✔ |
|
Computes potential label errors in a dataset using embeddings |
✔ |
✔ |
||
Assesses equivalence in label frequency between datasets |
✔ |
✔ |
||
Computes statistical summaries of labels in a dataset |
✔ |
✔ |
||
Calculates performance metrics for random classifiers on training and testing labels based on the class distributions |
✔ |
✔ |
||
Detects if there is a significant relationship between the factor values and class labels |
✔ |
✔ |
||
Determines feasibility of an object detection task by estimating upper bound on average precision |
✔ |
Algorithm |
Description |
Image Classification |
Object Detection |
Unsupervised |
|---|---|---|---|---|
Assesses the mutual information between factors |
✔ |
✔ |
||
Measures the distribution of metadata factors in the dataset |
✔ |
✔ |
||
|
Detects data distribution shifts from training data |
✔ |
✔ |
✔ |
Identifies duplicate data entries |
✔ |
✔ |
✔ |
|
|
Detects data points that fall outside the training distribution |
✔ |
✔ |
✔ |
Identifies anomalous data points based on deviations from mean |
✔ |
✔ |
✔ |
|
Orders samples based on embeddings |
✔ |
✔ |
✔ |
Algorithm |
Description |
Image Classification |
Object Detection |
Unsupervised |
|---|---|---|---|---|
Computes greatest deviation in metadata features per sample |
✔ |
✔ |
✔ |
|
Measures the most impactful metadata factors correlated with a flagged sample |
✔ |
✔ |
✔ |
Algorithm |
Description |
Image Classification |
Object Detection |
Unsupervised |
|---|---|---|---|---|
Determines data needs for performance standards |
✔ |
✔ |
Algorithm |
Description |
Image Classification |
Object Detection |
Unsupervised |
|---|---|---|---|---|
Generates train, val, and test splits based on information such as labels and metadata |
✔ |
✔ |
✔ |
|
A set of dataset filters that enable rapid development of various datasets |
✔ |
✔ |
✔ |
Input Requirements¶
The following table shows the input parameters used by each of DataEval’s core functionalities.
Note
DataEval imposes no restrictions on image type. It accepts any image modality (RGB, IR, EO, multispectral, greyscale, and others) at any bit depth (8-bit, 16-bit, 32-bit, etc.) and channel count (1+).
For more information on a specific algorithm, click the name in the table.
Algorithm |
Images |
Labels |
Bounding Boxes |
Metadata |
Scores |
Model/Extractor |
|---|---|---|---|---|---|---|
Required1 |
Required |
|||||
Required2 |
Required |
|||||
Required1 |
||||||
Required1 |
||||||
Required1 |
||||||
Required1 |
||||||
Required2 |
||||||
Required1 |
Required |
|||||
Required |
||||||
Required |
||||||
Required |
||||||
Required |
Required |
|||||
Required |
Required4 |
For more information on a specific algorithm, click the name in the table.
Algorithm |
Images |
Labels |
Bounding Boxes |
Metadata |
Scores |
Model/Extractor |
|---|---|---|---|---|---|---|
Required |
Required |
|||||
Required |
Required |
|||||
|
Required |
Required (Reconstruction) |
||||
Required2 |
Optional |
|||||
|
Required |
Required (Reconstruction) |
||||
Required |
Optional |
|||||
Required |
Optional |
For more information on a specific algorithm, click the name in the table.
Algorithm |
Images |
Labels |
Bounding Boxes |
Metadata3 |
Scores5 |
Model/Extractor |
|---|---|---|---|---|---|---|
Required |
Required |
|||||
Required |
Required |
For more information on a specific algorithm, click the name in the table.
Algorithm |
Images |
Labels |
Bounding Boxes |
Metadata |
Scores |
Model/Extractor |
|---|---|---|---|---|---|---|
Required |
Required |
OD Only |
For more information on a specific algorithm, click the name in the table.
Algorithm |
Images |
Labels |
Bounding Boxes |
Metadata |
Scores |
Model/Extractor |
|---|---|---|---|---|---|---|
Optional |
Optional3 |
|||||
Optional |
Optional |
Optional |
Note
1 It is highly recommended to give embeddings
over raw images using Embeddings.
2 Input data must be wrapped together in a Dataset.
3 When using only metadata, it must be wrapped in DataEval’s Metadata class.
4 These scores are the raw outputs of a model.
5 These scores are retrieved by DataEval’s Out Of Distribution (OOD) functions.