dataeval.detectors.ood.OOD_KNN¶
-
class dataeval.detectors.ood.OOD_KNN(k=
10, distance_metric='cosine')¶ K-Nearest Neighbors Out-of-Distribution detector.
Uses average cosine distance to k nearest neighbors in embedding space to detect OOD samples. Samples with larger average distances to their k nearest neighbors in the reference (in-distribution) set are considered more likely to be OOD.
Based on the methodology from: “Back to the Basics: Revisiting Out-of-Distribution Detection Baselines” (Kuan & Mueller, 2022)
As referenced in: “Safe AI for coral reefs: Benchmarking out-of-distribution detection algorithms for coral reef image surveys”
-
fit_embeddings(embeddings, threshold_perc=
95.0)¶ Fit the detector using reference (in-distribution) embeddings.
Builds a k-NN index for efficient nearest neighbor search and computes reference scores for automatic thresholding.
- Parameters:¶
- embeddings : dataeval.data.Embeddings¶
Reference embeddings from in-distribution data
- threshold_perc : float¶
Percentage of reference data considered normal
- Return type:¶
None
-
predict(X, batch_size=
int(10000000000.0), ood_type='instance')¶ Predict whether instances are out of distribution or not.
-
score(X, batch_size=
int(10000000000.0))¶ Compute the out of distribution scores for a given dataset.
-
fit_embeddings(embeddings, threshold_perc=