dataeval.shift.OODKNeighbors¶
-
class dataeval.shift.OODKNeighbors(k=
None, distance_metric=None, threshold_perc=None, extractor=None, config=None)¶ K-Nearest Neighbors Out-of-Distribution detector.
Uses average distance to k nearest neighbors in embedding space to detect OOD samples. Samples with larger average distances to their k nearest neighbors in the reference (in-distribution) set are considered more likely to be OOD.
Based on the methodology from: “Back to the Basics: Revisiting Out-of-Distribution Detection Baselines” (Kuan & Mueller, 2022)
As referenced in: “Safe AI for coral reefs: Benchmarking out-of-distribution detection algorithms for coral reef image surveys”
- Parameters:¶
- k : int, default 10¶
Number of nearest neighbors to consider
- distance_metric : "cosine" | "euclidean", default "cosine"¶
Distance metric to use
- threshold_perc : float or None, default None¶
Percentage of reference data considered normal (0-100). Higher values result in more permissive thresholds. If None, uses config.threshold_perc (default 95.0).
- extractor : FeatureExtractor or None, default None¶
Feature extractor for transforming input data before scoring. When provided, raw data is passed through the extractor in both
fit()andscore()/predict(). When None, data is used as-is (must be array-like embeddings).- config : OODKNeighbors.Config or None, default None¶
Optional configuration object with default parameters. Parameters specified directly in __init__ will override config defaults.
Examples
>>> from dataeval.shift import OODKNeighbors >>> import numpy as np >>> >>> # Create reference embeddings (in-distribution) >>> ref_embeddings = np.random.randn(100, 128).astype(np.float32) >>> >>> # Fit the detector >>> detector = OODKNeighbors(k=10, distance_metric="cosine", threshold_perc=95.0) >>> detector.fit(ref_embeddings) OODKNeighbors(k=10, distance_metric='cosine', threshold_perc=95.0, extractor=None, fitted=True) >>> >>> # Score new samples >>> test_embeddings = np.random.randn(20, 128).astype(np.float32) >>> scores = detector.score(test_embeddings) >>> predictions = detector.predict(test_embeddings)Using configuration:
>>> config = OODKNeighbors.Config(k=15, distance_metric="euclidean", threshold_perc=99.0) >>> detector = OODKNeighbors(config=config) >>> detector.fit(ref_embeddings) OODKNeighbors(k=15, distance_metric='euclidean', threshold_perc=99.0, extractor=None, fitted=True)- fit(reference_data)¶
Fit the detector using reference (in-distribution) data.
Builds a k-NN index for efficient nearest neighbor search and computes reference scores for automatic thresholding.
-
predict(data, batch_size=
None, ood_type='instance')¶ Predict whether instances are out of distribution.
- Parameters:¶
- data : ArrayLike¶
Input data for OOD prediction.
- batch_size : int or None, default None¶
Number of instances to process per batch (only used by some detectors). When None, uses the global batch size from
get_batch_size().- ood_type : "feature" | "instance", default "instance"¶
Predict OOD at the
"feature"or"instance"level.
- Returns:¶
Predictions including
is_oodboolean array and OOD scores.- Return type:¶
-
score(data, batch_size=
None)¶ Compute out of distribution scores for a given dataset.
- Parameters:¶
- data : ArrayLike¶
Input data to score.
- batch_size : int or None, default None¶
Number of instances to process per batch (only used by some detectors). When None, uses the global batch size from
get_batch_size().
- Returns:¶
Instance-level (and optionally feature-level) OOD scores. Higher scores indicate samples more likely to be OOD.
- Return type:¶
- property reference_embeddings : numpy.typing.NDArray[numpy.float32]¶
Reference embeddings stored by the scorer.
Classes¶
Configuration for OODKNeighbors detector. |