API Reference#

Detectors#

Detectors can determine if a dataset or individual images in a dataset are indicative of a specific issue.

Drift#

Drift detectors identify if the statistical properties of the data has changed.

detectors.drift.DriftCVM(x_ref[, p_val, ...])

Drift detector employing the Cramér-von Mises (CVM) distribution test.

detectors.drift.DriftKS(x_ref[, p_val, ...])

Drift detector employing the Kolmogorov-Smirnov (KS) distribution test.

detectors.drift.DriftUncertainty(x_ref, model)

Test for a change in the number of instances falling into regions on which the model is uncertain.

detectors.drift.DriftMMD(x_ref, p_val, ...)

Maximum Mean Discrepancy (MMD) data drift detector using a permutation test.

Kernels#

Kernels are used to map non-linear data to a higher dimensional space.

detectors.drift.kernels.GaussianRBF([sigma, ...])

Gaussian RBF kernel: k(x,y) = exp(-(1/(2*sigma^2)||x-y||^2).

Updates#

Update strategies inform how the drift detector classes update the reference data when monitoring for drift.

detectors.drift.updates.LastSeenUpdate(n)

Updates reference dataset for drift detector using last seen method.

detectors.drift.updates.ReservoirSamplingUpdate(n)

Updates reference dataset for drift detector using reservoir sampling method.

Linters#

Linters help identify potential issues in training and test data and are an important aspect of data cleaning.

detectors.linters.Clusterer(dataset)

Uses hierarchical clustering to flag dataset properties of interest like outliers and duplicates

detectors.linters.Duplicates([only_exact])

Finds the duplicate images in a dataset using xxhash for exact duplicates and pchash for near duplicates

detectors.linters.Outliers([use_dimension, ...])

Calculates statistical outliers of a dataset using various statistical tests applied to each image

Out-of-Distribution#

Out-of-distribution detectors identify data that is different from the data used to train a particular model.

detectors.ood.OOD_AE(model)

Autoencoder based out-of-distribution detector.

detectors.ood.OOD_AEGMM(model)

AE with Gaussian Mixture Model based outlier detector.

detectors.ood.OOD_LLR(model[, ...])

Likelihood Ratios based outlier detector.

detectors.ood.OOD_VAE(model[, samples])

VAE based outlier detector.

detectors.ood.OOD_VAEGMM(model[, samples])

VAE with Gaussian Mixture Model based outlier detector.

detectors.ood.OODScore(instance_score[, ...])

NamedTuple containing the instance and (optionally) feature score.

Metrics#

Metrics are a way to measure the performance of your models or datasets that can then be analyzed in the context of a given problem.

Bias#

Bias metrics check for skewed or imbalanced datasets and incomplete feature representation which may impact model performance.

metrics.bias.balance(class_labels, metadata)

Mutual information (MI) between factors (class label, metadata, label/image properties)

metrics.bias.coverage(embeddings[, ...])

Class for evaluating coverage and identifying images/samples that are in undercovered regions.

metrics.bias.diversity(class_labels, metadata)

Compute diversity and classwise diversity for discrete/categorical variables and, through standard histogram binning, for continuous variables.

metrics.bias.label_parity(expected_labels, ...)

Calculate the chi-square statistic to assess the parity between expected and observed label distributions.

metrics.bias.parity(class_labels, data_factors)

Calculate chi-square statistics to assess the relationship between multiple factors and class labels.

Estimators#

Estimators calculate performance bounds and the statistical distance between datasets.

metrics.estimators.ber(images, labels[, k, ...])

An estimator for Multi-class Bayes Error Rate using FR or KNN test statistic basis

metrics.estimators.divergence(data_a, data_b)

Calculates the divergence and any errors between the datasets

metrics.estimators.uap(labels, scores)

FR Test Statistic based estimate of the empirical mean precision for the upperbound average precision

Statistics#

Statistics metrics calculate a variety of image properties and pixel statistics against the image and individual channels of an image.

metrics.stats.boxratiostats(boxstats, imgstats)

Calculates ratio statistics of box outputs over image outputs

metrics.stats.datasetstats(images[, bboxes, ...])

Calculates various statistics for each image

metrics.stats.dimensionstats(images[, bboxes])

Calculates dimension statistics for each image

metrics.stats.hashstats(images[, bboxes])

Calculates hashes for each image

metrics.stats.labelstats(labels)

Calculates statistics for data labels

metrics.stats.pixelstats(images[, bboxes, ...])

Calculates pixel statistics for each image

metrics.stats.visualstats(images[, bboxes, ...])

Calculates visual statistics for each image

Workflows#

Workflows perform a sequence of actions to analyze the dataset and make predictions.

workflows.Sufficiency(model, train_ds, ...)

Project dataset sufficiency using given a model and evaluation criteria

Supported Backends#

The models and model trainers provided by DataEval are meant to assist users in setting up architectures that are guaranteed to work with applicable DataEval metrics. Currently DataEval supports both Tensorflow and PyTorch backends.

PyTorch

DataEval uses PyTorch as its main backend for metrics that require neural networks. While these metrics can take in custom models, DataEval provides utility classes to create a seamless integration between custom models and DataEval’s metrics.

Models

torch.models.AriaAutoencoder([channels])

An autoencoder model with a separate encoder and decoder.

torch.models.Decoder(channels)

A simple decoder to be used in an autoencoder model.

torch.models.Encoder([channels])

A simple encoder to be used in an autoencoder model.

Trainer

torch.trainer.AETrainer(model[, device, ...])

A class to train and evaluate an autoencoder model.

Tensorflow

The Tensorflow models provided are tailored for usage with the out of distribution detection metrics. DataEval provides both basic default models through the utility function create_model as well as constructors which allow for customization of the encoder, decoder and any other applicable layers used by the model.

Models

tensorflow.models.AE(*args, **kwargs)

Combine encoder and decoder in AE.

tensorflow.models.AEGMM(*args, **kwargs)

Deep Autoencoding Gaussian Mixture Model.

tensorflow.models.PixelCNN(image_shape[, ...])

Construct Pixel CNN++ distribution.

tensorflow.models.VAE(*args, **kwargs)

Combine encoder and decoder in VAE.

tensorflow.models.VAEGMM(*args, **kwargs)

Variational Autoencoding Gaussian Mixture Model.

tensorflow.models.create_model(model_type, ...)

Create a default model for the specified model type.

Reconstruction Functions

tensorflow.recon.eucl_cosim_features(x, y[, ...])

Compute features extracted from the reconstructed instance using the relative Euclidean distance and cosine similarity between 2 tensors.

Loss Function Classes

tensorflow.loss.Elbo([cov_type, x])

Compute ELBO loss.

tensorflow.loss.LossGMM([w_recon, w_energy, ...])

Loss function used for AE and VAE with GMM.