dataeval.shift.OODKNeighbors¶
-
class dataeval.shift.OODKNeighbors(k=
None, distance_metric=None, config=None)¶ K-Nearest Neighbors Out-of-Distribution detector.
Uses average distance to k nearest neighbors in embedding space to detect OOD samples. Samples with larger average distances to their k nearest neighbors in the reference (in-distribution) set are considered more likely to be OOD.
Based on the methodology from: “Back to the Basics: Revisiting Out-of-Distribution Detection Baselines” (Kuan & Mueller, 2022)
As referenced in: “Safe AI for coral reefs: Benchmarking out-of-distribution detection algorithms for coral reef image surveys”
- Parameters:¶
- k : int, default 10¶
Number of nearest neighbors to consider
- distance_metric : "cosine" | "euclidean", default "cosine"¶
Distance metric to use
- config : OODKNeighbors.Config or None, default None¶
Optional configuration object with default parameters. Parameters specified directly in __init__ will override config defaults.
Examples
>>> from dataeval.shift import OODKNeighbors >>> import numpy as np >>> >>> # Create reference embeddings (in-distribution) >>> ref_embeddings = np.random.randn(100, 128).astype(np.float32) >>> >>> # Fit the detector >>> detector = OODKNeighbors(k=10, distance_metric="cosine") >>> detector.fit(ref_embeddings, threshold_perc=95.0) >>> >>> # Score new samples >>> test_embeddings = np.random.randn(20, 128).astype(np.float32) >>> scores = detector.score(test_embeddings) >>> predictions = detector.predict(test_embeddings)Using configuration:
>>> config = OODKNeighbors.Config(k=15, distance_metric="euclidean", threshold_perc=99.0) >>> detector = OODKNeighbors(config=config) >>> detector.fit(ref_embeddings) # Uses config.threshold_perc-
fit(embeddings, threshold_perc=
None)¶ Fit the detector using reference (in-distribution) embeddings.
Builds a k-NN index for efficient nearest neighbor search and computes reference scores for automatic thresholding.
-
predict(x, batch_size=
int(10000000000.0), ood_type='instance')¶ Predict whether instances are out of distribution or not.
- Parameters:¶
- x : ArrayLike¶
Input embeddings for out-of-distribution prediction.
- batch_size : int, default 1e10¶
Batch size parameter for API consistency. Not used by this detector.
- ood_type : "feature" | "instance", default "instance"¶
OOD type parameter for API consistency. This detector only supports “instance” level.
- Returns:¶
Dictionary containing: - is_ood: Boolean array indicating which samples are OOD - instance_score: OOD scores for all samples - feature_score: None (not supported by this detector)
- Return type:¶
-
score(x, batch_size=
int(10000000000.0))¶ Compute the out of distribution scores for a given dataset.
Classes¶
Configuration for OODKNeighbors detector. |