How-to Guides

Warning

The How Tos are WIP and are expected to be heavily modified in the future

These guides demonstrate more in-depth features and customizations of DataEval features for more advanced users.

In addition to viewing them in our documentation, these notebooks can also be opened in Google Colab to be used interactively!

Detectors

The purpose of these tools is to identify or detect issues within a dataset. The guides below exemplify powerful solutions to common problems in ML.

How to run clustering analysis

Identify outliers and anomalies with clustering algorithms

Open In Colab

How to identify duplicates

Identify and remove duplicates from a PyTorch Dataset

Open In Colab

How to visualize linting issues

Find negatively impactful images in multiple backgrounds

Open In Colab

Metrics

Metrics are a set of tools that measure and analyze data. The guides below show best practices when solving common ML problems.

How to determine image classification feasibility

Calculate feasibility of performance requirements on different datasets using Bayes Error Rate (BER)

Open In Colab

How to measure train and test dataset divergence

Display data distributions between 2 datasets

Open In Colab

How to measure label independence

Compare label distributions between 2 datasets

Open In Colab

How to detect undersampled data subsets

Detect undersampled subsets of datasets

Open In Colab

How to add intrinsic factors to Metadata

Apply DataEval’s statistical outputs to DataEval’s Metadata object for bias analysis

Open In Colab

Workflows

Workflows are end-to-end processes that detect, measure, and analyze data against requirements. The guides below help you solve common problems found across machine learning tasks.

How to measure dataset sufficiency for image classification

Determine the amount of data needed to meet image classification performance requirements

Open In Colab

Models

DataEval uses models during all stages of the ML Lifecycle. The guides below show specific examples on model usage at different levels of expertise.

How to train an autoencoder for embeddings

Train and evaluate an autoencoder to generate effective image embeddings for downstream tasks

Open In Colab