dataeval.shift.DriftKNeighbors¶
-
class dataeval.shift.DriftKNeighbors(k=
None, distance_metric=None, p_val=None, extractor=None, update_strategy=None, config=None)¶ K-Nearest Neighbors based drift detector.
Detects drift by comparing k-NN distances of test samples against the reference set. If test samples are farther from their k nearest neighbors in the reference set than expected, drift is detected.
Uses a fit/predict lifecycle: construct with hyperparameters, call
fit()with reference data, then callpredict()with test data. Usechunked()to create a chunked wrapper for time-series monitoring.Supports two modes:
Non-chunked (default): Computes per-sample k-NN distances for the test set and uses a Mann-Whitney U test against the reference baseline to produce a p-value. Drift is flagged when
p_val < p_val_threshold.Chunked (via
chunked()): Splits data into chunks, computes mean k-NN distance per chunk, and uses threshold bounds to flag drift per chunk.
- Parameters:¶
- k : int, default 10¶
Number of nearest neighbors.
- distance_metric : {"cosine", "euclidean"}, default "euclidean"¶
Distance metric for neighbor search.
- p_val : float, default 0.05¶
Significance threshold for non-chunked mode.
- extractor : FeatureExtractor or None, default None¶
Feature extractor for transforming input data before drift detection. When provided, raw data is passed through the extractor before flattening and comparison. When None, data is used as-is.
- update_strategy : UpdateStrategy or None, default None¶
Strategy for updating reference data when new data arrives. When None, reference data remains fixed throughout detection.
- config : DriftKNeighbors.Config or None, default None¶
Optional configuration object.
See also
DriftKNeighbors.StatsPer-prediction statistics returned in
DriftOutput.details.
Examples
Non-chunked mode:
>>> ref = np.random.randn(200, 32).astype(np.float32) >>> test = np.random.randn(100, 32).astype(np.float32) + 5 # shifted >>> detector = DriftKNeighbors(k=5).fit(ref) >>> result = detector.predict(test) >>> print(f"Drift: {result.drifted}") Drift: ...Chunked mode:
>>> chunked = DriftKNeighbors(k=5).chunked(chunk_size=50) >>> chunked.fit(ref) ChunkedDrift(DriftKNeighbors(k=5, distance_metric='euclidean', p_val=0.05, extractor=None, update_strategy=None), chunker=SizeChunker(chunk_size=50, incomplete='keep'), fitted=True) >>> result = chunked.predict(test)-
chunked(chunker=
None, chunk_size=None, chunk_count=None, threshold=None)¶ Create a chunked wrapper around this drift detector.
Returns a
ChunkedDriftthat splits data into chunks during fit and predict, computing per-chunk metrics and comparing against baseline thresholds.- Parameters:¶
- chunker : BaseChunker or None, default None¶
Explicit chunker instance.
- chunk_size : int or None, default None¶
Create fixed-size chunks of this many samples.
- chunk_count : int or None, default None¶
Split into this many equal chunks.
- threshold : Threshold or None, default None¶
Threshold strategy for determining drift bounds from baseline. When None, uses the detector’s default threshold.
- Returns:¶
A chunked drift wrapper around this detector.
- Return type:¶
ChunkedDrift[TDetails]
- fit(reference_data)¶
Fit the k-NN drift detector on reference data.
- predict(data)¶
Predict whether test data has drifted from reference data.
- property reference_data : numpy.typing.NDArray[numpy.float32]¶
Reference data, lazily encoded on first access.
Overrides
BaseDrift.reference_datavia MRO when this mixin appears beforeBaseDriftin the inheritance list.
Classes¶
Configuration for DriftKNeighbors detector. |
|
Statistics from K-Nearest Neighbors drift detection. |