dataeval.core.ber_knn¶

dataeval.core.ber_knn(embeddings, class_labels, k)¶

Estimate Multi-class Bayes error rate using KNN.

BER bounds the irreducible classification error given the current feature representation — the error attributable to class overlap in embedding space. Uses KNN test statistic basis. The estimator’s behavior depends on the value of k: - k=1: Uses 1-NN for the lower bound and 2-NN for the upper bound. - k=2: Uses 2-NN for the lower bound and 3-NN for the upper bound. - 2<k<=5: Uses k-NN for the lower bound and (k+1)-NN for the upper bound. - k>5: Only available for binary classification; uses k-NN for both bounds with specialized asymptotic weights.

Parameters:¶

embeddings : ArrayND[float]¶: Array of image embeddings. Can be an N dimensional list, or array-like object.
class_labels : Array1D[int]¶: Array of class labels for each image. Can be a 1D list, or array-like object.
k : int¶: Number of nearest neighbors for KNN estimator. Should be between 1 and the number of samples.

Returns:¶

Mapping with keys:

upper_bound: float - The upper bound of the Bayes Error Rate
lower_bound: float - The lower bound of the Bayes Error Rate

Return type:¶

BERResult

References

[1] Learning to Bound the Multi-class Bayes Error (Th. 3 and Th. 4)

Examples

>>> import sklearn.datasets as dsets
>>> from dataeval.core import ber_knn

>>> images, labels = dsets.make_blobs(n_samples=50, centers=2, n_features=2, random_state=0)
>>> ber_knn(images, labels, 1)
{'upper_bound': 0.04, 'lower_bound': 0.020416847668728033}