dataeval.core.label_errors¶
-
dataeval.core.label_errors(embeddings, labels, k=
50)¶ Identify potential label errors in a dataset using embedding geometry.
Computes an “Intra/Extra Class Distance Ratio” for every sample. Samples are flagged as errors if they are significantly closer to samples of a different class than to samples of their own class (score >= 1.0).
- Parameters:¶
- embeddings : NDArray¶
Input feature embeddings (e.g., from DINO, ResNet) with shape (n_samples, n_features).
- labels : NDArray[np.int64]¶
Ground truth labels corresponding to the embeddings, with shape (n_samples,).
- k : int, optional¶
Number of neighbors to use for local density estimation. Default is 50.
- Returns:¶
A dictionary containing:
’errors’: Dict mapping sample indices to tuples of (original_label, [suggested_labels]). Only contains samples with a score >= 1.0.
’error_rank’: Array of sample indices sorted by likelihood of error (descending score).
’scores’: Array of raw distance ratio scores for all samples.
- Return type:¶