How-to Guides¶
Warning
The How Tos are WIP and are expected to be heavily modified in the future
These guides demonstrate more in-depth features and customizations of DataEval features for more advanced users.
In addition to viewing them in our documentation, these notebooks can also be opened in Google Colab to be used interactively!
General Usage¶
These guides will provide quick examples of how to configure DataEval for your environment.
Configure global hardware settings used in DataEval |
||
Configure logging with DataEval |
Detectors¶
The purpose of these tools is to identify or detect issues within a dataset. The guides below exemplify powerful solutions to common problems in ML.
Identify outliers and anomalies with clustering algorithms |
||
Identify and remove duplicates from a PyTorch Dataset |
||
Find negatively impactful images in multiple backgrounds |
||
How to specify custom statistics on object detection datasets |
Customize calculation of image stats on an object detection dataset |
Metrics¶
Metrics are a set of tools that measure and analyze data. The guides below show best practices when solving common ML problems.
Calculate feasibility of performance requirements on different datasets using Bayes Error Rate (BER) |
||
Display data distributions between 2 datasets |
||
Compare label distributions between 2 datasets |
||
Detect undersampled subsets of datasets |
||
Apply DataEval’s statistical outputs to
DataEval’s |
Workflows¶
Workflows are end-to-end processes that detect, measure, and analyze data against requirements. The guides below help you solve common problems found across machine learning tasks.
Determine the amount of data needed to meet image classification performance requirements |