dataeval.core

Core stateless functions for performing dataset, metadata and model evaluation.

Classes

BERResult

Type definition for Bayes Error Rate bounds output.

ClusterResult

Type definition for cluster output.

ClusterStats

Pre-calculated statistics for adaptive outlier detection.

CompletenessResult

Type definition for completeness output.

CoverageResult

Type definition for coverage output.

DivergenceResult

Type definition for divergence output.

FeatureDistanceResult

Type definition for feature distance output.

LabelAlignmentResult

Result of aligning a source vocabulary against a target ontology.

LabelCoverageResult

Observed distribution of a dataset’s label mass over an Ontology.

LabelErrorResult

Type definition for label error output.

LabelParityResult

Type definition for label parity output.

LabelReconciliationResult

Result of reconciling class labels against an ontology.

LabelStatsResult

Type definition for label statistics output.

MSTResult

Type definition for minimum spanning tree output.

MutualInfoResult

Type definition for normalized mutual information output.

NullModelMetrics

Per-model results for null-model metrics.

NullModelMetricsResult

Type definition for null model metrics output.

OntologyValidationResult

Structural and naming facts about an Ontology artifact.

ParityResult

Type definition for parity output.

RankResult

Type definition for rank output.

StatsResult

Type definition for calculation output.

TrackStatsResult

Compute per-track statistics for one video sequence.

Functions

ber_knn

Estimate Multi-class Bayes error rate using KNN.

ber_mst

Estimate Multi-class Bayes error rate using a minimum spanning tree.

cluster

Use hierarchical clustering on the flattened data and return clustering information.

combine_stats_results

Combine one or more StatsResults into unified stats, source_index, and dataset_steps.

completeness

Measure the dimensional utilization of embeddings.

compute_cluster_stats

Compute cluster centers and distance statistics for adaptive outlier detection.

compute_neighbors

For each sample in data_query, compute the k nearest neighbors in data_fit.

compute_ratios

Compute box-to-image ratios from compute_stats() output.

compute_stats Deprecated

Compute specified statistics on a set of images, optionally within bounding boxes.

coverage_adaptive

Evaluate coverage using an adaptive radius calculation method.

coverage_naive

Evaluate coverage using a naive radius calculation method.

dhash

Compute difference hash (dHash) for an image.

dhash_d4

Compute orientation-invariant difference hash using gradients.

divergence_fnn

Compute the divergence by counting label disagreements between nearest neighbors.

divergence_mst

Compute the divergence by counting “between dataset” edges in the minimum spanning tree.

factor_deviation

Determine greatest deviation in metadata features per sample.

factor_predictors

Compute a measure of mutual information between metadata factors and flagged sample indices.

feature_distance

Measure the feature-wise distance between two continuous distributions.

label_alignment

Align a source label vocabulary against a target ontology.

label_coverage

Report how a dataset’s label mass is distributed over an ontology.

label_errors

Identify potential label errors in a dataset using embedding geometry.

label_parity

Compute the chi-square statistic to assess label distribution parity.

label_reconciliation

Reconcile class labels against an ontology and recover their hierarchy.

label_stats

Compute statistics for data labels.

minimum_spanning_tree

Compute the minimum spanning tree of a dataset.

mutual_info

Compute normalized mutual information between factors, transformed to lie in [0, 1].

mutual_info_classwise

Compute normalized mutual information (NMI) between factors.

nullmodel_accuracy

Compute accuracy from binary classification results.

nullmodel_fpr

Compute FPR (False Positive Rate) from binary classification results.

nullmodel_metrics

Compute null model metrics (dummy classifiers metrics) for given class distributions.

nullmodel_precision

Compute precision from binary classification results.

nullmodel_recall

Compute recall (True Positive Rate) from binary classification results.

ontology_validation

Validate an ontology artifact and report its structural and naming facts.

parity

Compute statistical parity using Bias-Corrected Cramér’s V.

phash

Compute perceptual hash using Discrete Cosine Transform (DCT).

phash_d4

Compute orientation-invariant perceptual hash using DCT.

rank_hdbscan_complexity

Rank samples using HDBSCAN cluster complexity weighting.

rank_hdbscan_distance

Rank samples using distance to HDBSCAN cluster centers.

rank_kmeans_complexity

Rank samples using cluster complexity weighting.

rank_kmeans_distance

Rank samples using distance to cluster centers.

rank_knn

Rank samples using k-nearest neighbors distance.

rank_result_class_balanced

Transform RankResult indices using class-balanced selection.

rank_result_stratified

Transform RankResult indices using stratified sampling.

track_stats

Compute per-track statistics for a single video sequence.

uap

Estimate the empirical mean precision for the upperbound average precision.

xxhash

Compute fast non-cryptographic hash using xxHash algorithm.