Functional Overview¶

The following tables summarize the advised use cases and technical requirements for the algorithms provided by the DataEval library. Each algorithm targets different types of data or problem domains. Refer to the method-specific pages by clicking the algorithms for more detailed information.

Computer Vision Task Compatibility¶

The following tables show the compatible computer vision tasks that have support in DataEval. The tables are split into categories based on usage and follow DataEval’s public API.

Metrics

Algorithm	Description	Image Classification	Object Detection	Unsupervised
`Balance`	Assesses the mutual information between factors	✔	✔
`Bayes error rate`	Determines feasibility of image classification by estimating the bayes error rate	✔
`Box statistics`	Computes statistical summaries of target boxes		✔
`Completeness`	Measures the degree to which images span the learned embedding space	✔	✔	✔
`Coverage`	Measures how well the distribution of images in a dataset covers the input space	✔	✔	✔
`Dimension stats`	Computes statistical summaries of image and target box dimensions	✔	✔	✔
`Divergence`	Measures the difference between dataset distributions	✔	✔	✔
`Diversity`	Measures the distribution of metadata factors in the dataset	✔	✔	✔
`Image statistics`	Computes statistical summaries of images in a dataset	✔	✔	✔
`Label parity`	Assesses equivalence in label frequency between datasets	✔	✔
`Label stats`	Computes statistical summaries of labels in a dataset	✔	✔
`Null model metrics`	Calculates performance metrics for random classifiers on training and testing labels based on the class distributions	✔	✔
`Parity`	Detects if there is a significant relationship between the factor values and class labels	✔	✔
`UAP`	Determines feasibility of an object detection task by estimating upper bound on average precision		✔

Detectors

Algorithm	Description	Image Classification	Object Detection	Unsupervised
`Drift`	Detects data distribution shifts from training data	✔	✔	✔
`Duplicate Detection`	Identifies duplicate data entries	✔	✔	✔
`Out-of-Distribution`	Detects data points that fall outside the training distribution	✔	✔	✔
`Outliers`	Identifies anomalous data points based on deviations from mean	✔	✔	✔

Metadata

Algorithm	Description	Image Classification	Object Detection	Unsupervised
`Most Deviated Factors`	Measures the greatest deviated metadata factors for detected out of distribution samples	✔	✔	✔
`OOD Predictors`	Measures the most impactful factors for detected out of distribution samples	✔	✔	✔

Workflows

Algorithm	Description	Image Classification	Object Detection	Unsupervised
`Sufficiency`	Determines data needs for performance standards	✔	✔

Data Selection

Algorithm	Description	Image Classification	Object Detection	Unsupervised
`Dataset Splitter`	Generates train, val, and test splits based on information such as labels and metadata	✔	✔	✔
`Select`	A set of dataset filters that enable rapid development of various datasets	✔	✔	✔

Input Requirements¶

The following table shows the input parameters used by each of DataEval’s core functionalities.

Metrics

For more information on a specific algorithm, click the name in the table.
For an overview, see the metrics page.

Algorithm	Images	Labels	Bounding Boxes	Metadata	Scores
`Balance`		Required		Required
`Bayes error rate`	Required¹	Required
`Box statistics`	Required		Required
`Completeness`	Required¹
`Coverage`	Required¹
`Dimension stats`	Required²
`Divergence`	Required¹
`Diversity`		Required		Required
`Image statistics`	Required²
`Label parity`		Required
`Label stats`		Required
`Null model metrics`		Required
`Parity`		Required		Required
`UAP`		Required			Required⁴

Detectors

For more information on a specific algorithm, click the name in the table.
For an overview, see the detectors page.

Algorithm	Images	Labels	Bounding Boxes	Metadata	Scores
`Drift`	Required
`Duplicate Detection`	Required²
`Out-of-Distribution`	Required
`Outliers`	Required

Metadata

For more information on a specific algorithm, click the name in the table.
For an overview, see the metadata page.

Algorithm	Images	Labels	Bounding Boxes	Metadata³	Scores⁵
`Most Deviated Factors`				Required	Required
`OOD Predictors`				Required	Required

Workflows

For more information on a specific algorithm, click the name in the table.
For an overview, see the workflows page.

Algorithm	Images	Labels	Bounding Boxes	Metadata	Scores	Model
`Sufficiency`²	Required	Required	OD Only			Task specific

Data Selection

For more information on a specific algorithm, click the name in the table.
For an overview, see the data page.

Algorithm	Images	Labels	Bounding Boxes	Metadata	Scores	Model
`Dataset Splitter`²		Optional		Optional³
`Select`²	Optional	Optional		Optional		Optional

Note

¹ It is highly recommended to give embeddings over raw images using Embeddings.
² Input data must be wrapped together in a Dataset.
³ When using only metadata, it must be wrapped in DataEval’s Metadata class.
⁴ These scores are the raw outputs of a model.
⁵ These scores are retrieved by DataEval’s Out Of Distribution functions.