dataeval.core¶

Core stateless functions for performing dataset, metadata and model evaluation.

Classes¶

`BERResult`	Type definition for Bayes Error Rate bounds output.
`ClusterResult`	Type definition for cluster output.
`ClusterStats`	Pre-calculated statistics for adaptive outlier detection.
`CompletenessResult`	Type definition for completeness output.
`CoverageResult`	Type definition for coverage output.
`DivergenceResult`	Type definition for divergence output.
`FeatureDistanceResult`	Type definition for feature distance output.
`LabelAlignmentResult`	Result of aligning a source vocabulary against a target ontology.
`LabelCoverageResult`	Observed distribution of a dataset’s label mass over an `Ontology`.
`LabelErrorResult`	Type definition for label error output.
`LabelParityResult`	Type definition for label parity output.
`LabelReconciliationResult`	Result of reconciling class labels against an ontology.
`LabelStatsResult`	Type definition for label statistics output.
`MSTResult`	Type definition for minimum spanning tree output.
`MutualInfoResult`	Type definition for normalized mutual information output.
`NullModelMetrics`	Per-model results for null-model metrics.
`NullModelMetricsResult`	Type definition for null model metrics output.
`OntologyValidationResult`	Structural and naming facts about an `Ontology` artifact.
`ParityResult`	Type definition for parity output.
`RankResult`	Type definition for rank output.
`StatsResult`	Type definition for calculation output.
`TrackStatsResult`	Compute per-track statistics for one video sequence.

Functions¶

`ber_knn`	Estimate Multi-class Bayes error rate using KNN.
`ber_mst`	Estimate Multi-class Bayes error rate using a minimum spanning tree.
`cluster`	Use hierarchical clustering on the flattened data and return clustering information.
`combine_stats_results`	Combine one or more StatsResults into unified stats, source_index, and dataset_steps.
`completeness`	Measure the dimensional utilization of embeddings.
`compute_cluster_stats`	Compute cluster centers and distance statistics for adaptive outlier detection.
`compute_neighbors`	For each sample in data_query, compute the k nearest neighbors in data_fit.
`compute_ratios`	Compute box-to-image ratios from compute_stats() output.
`compute_stats` Deprecated	Compute specified statistics on a set of images, optionally within bounding boxes.
`coverage_adaptive`	Evaluate coverage using an adaptive radius calculation method.
`coverage_naive`	Evaluate coverage using a naive radius calculation method.
`dhash`	Compute difference hash (dHash) for an image.
`dhash_d4`	Compute orientation-invariant difference hash using gradients.
`divergence_fnn`	Compute the divergence by counting label disagreements between nearest neighbors.
`divergence_mst`	Compute the divergence by counting “between dataset” edges in the minimum spanning tree.
`factor_deviation`	Determine greatest deviation in metadata features per sample.
`factor_predictors`	Compute a measure of mutual information between metadata factors and flagged sample indices.
`feature_distance`	Measure the feature-wise distance between two continuous distributions.
`label_alignment`	Align a source label vocabulary against a target ontology.
`label_coverage`	Report how a dataset’s label mass is distributed over an ontology.
`label_errors`	Identify potential label errors in a dataset using embedding geometry.
`label_parity`	Compute the chi-square statistic to assess label distribution parity.
`label_reconciliation`	Reconcile class labels against an ontology and recover their hierarchy.
`label_stats`	Compute statistics for data labels.
`minimum_spanning_tree`	Compute the minimum spanning tree of a dataset.
`mutual_info`	Compute normalized mutual information between factors, transformed to lie in [0, 1].
`mutual_info_classwise`	Compute normalized mutual information (NMI) between factors.
`nullmodel_accuracy`	Compute accuracy from binary classification results.
`nullmodel_fpr`	Compute FPR (False Positive Rate) from binary classification results.
`nullmodel_metrics`	Compute null model metrics (dummy classifiers metrics) for given class distributions.
`nullmodel_precision`	Compute precision from binary classification results.
`nullmodel_recall`	Compute recall (True Positive Rate) from binary classification results.
`ontology_validation`	Validate an ontology artifact and report its structural and naming facts.
`parity`	Compute statistical parity using Bias-Corrected Cramér’s V.
`phash`	Compute perceptual hash using Discrete Cosine Transform (DCT).
`phash_d4`	Compute orientation-invariant perceptual hash using DCT.
`rank_hdbscan_complexity`	Rank samples using HDBSCAN cluster complexity weighting.
`rank_hdbscan_distance`	Rank samples using distance to HDBSCAN cluster centers.
`rank_kmeans_complexity`	Rank samples using cluster complexity weighting.
`rank_kmeans_distance`	Rank samples using distance to cluster centers.
`rank_knn`	Rank samples using k-nearest neighbors distance.
`rank_result_class_balanced`	Transform RankResult indices using class-balanced selection.
`rank_result_stratified`	Transform RankResult indices using stratified sampling.
`track_stats`	Compute per-track statistics for a single video sequence.
`uap`	Estimate the empirical mean precision for the upperbound average precision.
`xxhash`	Compute fast non-cryptographic hash using xxHash algorithm.