How-to Guides¶
These guides help you accomplish specific tasks with DataEval. Each one addresses a practical problem and walks you through the solution step by step.
In addition to viewing them in our documentation, these notebooks can also be opened in Google Colab to be used interactively!
The guides are organized by where they fall in the machine learning life cycle:
Configuration¶
These guides will provide quick examples of how to configure DataEval for your environment.
How to configure global hardware configuration defaults in DataEval |
Configure global hardware settings used in DataEval |
|
Configure logging with DataEval |
Data Engineering¶
These guides cover tasks related to preparing, cleaning, exploring, and curating datasets for machine learning.
Encode image embeddings with an ONNX model |
||
Identify outliers and anomalies with clustering algorithms |
||
Identify and remove duplicates from a PyTorch Dataset |
||
Find negatively impactful images in multiple backgrounds |
||
How to specify custom statistics on object detection datasets |
Customize calculation of image stats on an object detection dataset |
|
Apply DataEval’s statistical outputs to
DataEval’s |
||
Detect undersampled subsets of datasets |
Model Development¶
These guides cover tasks related to assessing data feasibility and sufficiency for model training.
Calculate feasibility of performance requirements on different datasets using Bayes Error Rate (BER) |
||
Determine the amount of data needed to meet image classification performance requirements |
Monitoring¶
These guides cover tasks related to comparing datasets and detecting distribution shifts in deployed systems.
Display data distributions between 2 datasets |
||
Compare label distributions between 2 datasets |