dataeval.shift.DriftKNeighbors¶
-
class dataeval.shift.DriftKNeighbors(k=
None, distance_metric=None, p_val=None, config=None)¶ K-Nearest Neighbors based drift detector.
Detects drift by comparing k-NN distances of test samples against the reference set. If test samples are farther from their k nearest neighbors in the reference set than expected, drift is detected.
Uses a fit/predict lifecycle: construct with hyperparameters, call
fit()with reference data, then callpredict()with test data.Supports two modes:
Non-chunked (default): Computes per-sample k-NN distances for the test set and uses a Mann-Whitney U test against the reference baseline to produce a p-value. Drift is flagged when
p_val < p_val_threshold.Chunked: Splits data into chunks, computes mean k-NN distance per chunk, and uses threshold bounds to flag drift per chunk.
- Parameters:¶
- k : int, default 10¶
Number of nearest neighbors.
- distance_metric : {"cosine", "euclidean"}, default "euclidean"¶
Distance metric for neighbor search.
- p_val : float, default 0.05¶
Significance threshold for non-chunked mode.
- config : DriftKNeighbors.Config or None, default None¶
Optional configuration object.
Examples
Non-chunked mode:
>>> ref = np.random.randn(200, 32).astype(np.float32) >>> test = np.random.randn(100, 32).astype(np.float32) + 5 # shifted >>> detector = DriftKNeighbors(k=5).fit(ref) >>> result = detector.predict(test) >>> print(f"Drift: {result.drifted}") Drift: ...Chunked mode:
>>> detector = DriftKNeighbors(k=5).fit(ref, chunk_size=50) >>> result = detector.predict(test)-
fit(x_ref, chunker=
None, chunk_size=None, chunk_count=None, chunks=None, chunk_indices=None, threshold=None)¶ Fit the k-NN drift detector on reference data.
- Parameters:¶
- x_ref : ArrayLike¶
Reference data with dim[n_samples, n_features].
- chunker : BaseChunker or None, default None¶
Explicit chunker instance for chunked mode.
- chunk_size : int or None, default None¶
Create fixed-size chunks.
- chunk_count : int or None, default None¶
Split into this many equal chunks.
- chunks : list[ArrayLike] or None, default None¶
Pre-split reference data for chunked mode.
- chunk_indices : list[list[int]] or None, default None¶
Index groupings for chunking reference data.
- threshold : Threshold or None, default None¶
Threshold strategy for chunked mode. Defaults to ZScoreThreshold.
- Return type:¶
Self
-
predict(x=
None, chunks=None, chunk_indices=None)¶ Predict whether test data has drifted from reference data.
- property is_chunked : bool¶
Whether the detector is operating in chunked mode.
Classes¶
Configuration for DriftKNeighbors detector. |