dataeval.core

Core stateless functions for performing dataset, metadata and model evaluation.

Submodules

flags

Module for flag enums that control function behavior.

Functions

balance(class_labels, factor_data[, ...])

Mutual information between factors (class label, metadata, label/image properties).

balance_classwise(class_labels, factor_data[, ...])

Mutual information (MI) between factors (class label, metadata, label/image properties).

ber_knn(data, labels, k)

An estimator for Multi-class Bayes error rate using KNN test statistic basis.

ber_mst(data, labels)

An estimator for Multi-class Bayes error rate using FR with a minimum spanning tree (MST) test statistic basis.

calculate(images, boxes[, stats, per_image, per_box, ...])

Compute specified statistics on a set of images.

cluster(data)

Uses hierarchical clustering on the flattened data and returns clustering

compute_neighbor_distances(data[, k])

Compute k nearest neighbors for each point in data (self-query, excluding self).

compute_neighbors(data_fit, data_query[, k, algorithm])

For each sample in data_query, compute the k nearest neighbors in data_fit.

coverage_adaptive(embeddings, num_observations, percent)

Evaluate coverage using an adaptive radius calculation method.

coverage_naive(embeddings, num_observations)

Evaluate coverage using a naive radius calculation method.

divergence_fnn(data, labels)

Counts label disagreements between nearest neighbors in data.

divergence_mst(data, labels)

Counts the number of cross-label edges in the minimum spanning tree of data.

feature_distance(continuous_data_1, continuous_data_2)

Measures the feature-wise distance between two continuous distributions and computes a

label_parity(expected_labels, observed_labels, *[, ...])

Calculate the chi-square statistic to assess the parity between expected and observed label distributions.

minimum_spanning_tree(data[, k])

Compute the minimum spanning tree of a dataset.

nullmodel_accuracy(class_prob, model_prob, *[, multiclass])

Calculates accuracy from binary classification results.

nullmodel_fpr(class_prob, model_prob)

Calculates FPR (False Positive Rate) from binary classification results.

nullmodel_precision(class_prob, model_prob)

Calculates precision from binary classification results.

nullmodel_recall(class_prob, model_prob)

Calculates recall (True Positive Rate) from binary classification results.

parity(…)

Calculate chi-square statistics to assess the linear relationship between multiple factors and class labels.

pchash(image)

Performs a perceptual hash on an image by resizing to a square NxN image

xxhash(image)

Performs a fast non-cryptographic hash using the xxhash algorithm