DataEval for Machine Learning Engineers

A machine learning (ML) engineer is focused on the practical application of machine learning models:

  • designing,

  • building,

  • training,

  • testing, and

  • deploying ML models.

While they are deeply involved in the model development process, they also share some common ground with data scientists and test and evaluation engineers, as they are constantly evaluating and iterating on their models and the data used to train them.

An ML engineer’s workflow is highly iterative. They are in a continuous loop of data preparation, model training, evaluation, and deployment. DataEval is a powerful toolkit for the ML engineer, especially in the early stages of this loop, where the quality of the data directly impacts the performance of the final model.

flowchart 1 Scope And Objectives Scope And Objectives 2 Data Engineering Data Engineering 1:e->2:n 1:s->2:w 3 Model Development Model Development 1:s->3:w 4 Deployment Deployment 1:s->4:n 5 Monitoring Monitoring 1:s->5:e 2:s->3:n 2:w->5:e 6 Analysis Analysis 2:w->6:e 3:s->4:e 3:w->5:e 3:w->6:e 4:w->5:s 5:n->6:s 6:n->1:w

Key ML engineer tasks and relevant DataEval functions

The following sections highlight some ML engineer tasks along with the different DataEval tools that can be leveraged in order to accomplish the task.

Clean and preprocess data for training

Perform necessary data cleaning, normalization, resizing, and transformation steps required by the model. Additionally, apply augmentation techniques during training to improve model generalization.

Data can be cleaned with DataEval’s Outliers class and Duplicates class and be analyzed for biases and correlations with DataEval’s balance(), diversity(), and parity() functions.

DataEval’s Dataset class supports preprocessing/augmentation libraries such as torchvision, albumentations, and others which perform the normalization, resizing, and transformation steps.

Determine problem feasibility

Analyze the dataset to determine if the cleaned dataset is an adequate dataset given the problem requirements and complexity.

DataEval’s ber() and uap() functions calculate the upper performance bound given the specific dataset. It allows for comparison of different datasets to determine the best dataset for the problem.

Create dataset splits

Analyze the dataset to create a training, validation and testing subset. Ensure that each split adequately represents the target operational environment and that there are no correlations between the splits.

Datasets can be split using DataEval’s split_dataset(), which has options that enable the user to split the data based on metadata. DataEval’s bias functions, balance() and diversity() can help identify when there may be spurious correlations between the splits.

Build and evaluate models

Train standard models to establish a performance baseline against and then train experimental and complex models to systematically evaluate model architectures.

While DataEval does not assist in the building and training of ML models, it does contain Sufficiency which allows the user to compare model performance of multiple models, including current model performance and predicted performance at different amounts of data, along with the predicted model saturation point.

Analyze and interpret model errors

Go beyond top-line metrics to perform detailed error analysis. Visualize the false positives and false negatives to understand why the model is failing (e.g., it confuses similar objects, fails on small objects, or struggles in low light).

By combining multiple DataEval functions – Select class, imagestats(), labelstats(), cluster(), balance(), and diversity() – model failures can be investigated at the image level.

Monitor model performance

Implement monitoring to track operational metrics (latency, throughput) and to detect data drift. Analyze why a model’s performance is decaying in production by comparing the distribution of image statistics (or embeddings) between the new data and the training data, then propose a retraining or calibration strategy.

DataEval has a set of drift and out-of-distribution (OOD) detection functions, along with divergence() and label_parity(), to identify differences between operational and training distributions of both images and labels.