dataeval.shift.DriftKNeighbors

class dataeval.shift.DriftKNeighbors(k=None, distance_metric=None, p_val=None, config=None)

K-Nearest Neighbors based drift detector.

Detects drift by comparing k-NN distances of test samples against the reference set. If test samples are farther from their k nearest neighbors in the reference set than expected, drift is detected.

Uses a fit/predict lifecycle: construct with hyperparameters, call fit() with reference data, then call predict() with test data.

Supports two modes:

  • Non-chunked (default): Computes per-sample k-NN distances for the test set and uses a Mann-Whitney U test against the reference baseline to produce a p-value. Drift is flagged when p_val < p_val_threshold.

  • Chunked: Splits data into chunks, computes mean k-NN distance per chunk, and uses threshold bounds to flag drift per chunk.

Parameters:
k : int, default 10

Number of nearest neighbors.

distance_metric : {"cosine", "euclidean"}, default "euclidean"

Distance metric for neighbor search.

p_val : float, default 0.05

Significance threshold for non-chunked mode.

config : DriftKNeighbors.Config or None, default None

Optional configuration object.

Examples

Non-chunked mode:

>>> ref = np.random.randn(200, 32).astype(np.float32)
>>> test = np.random.randn(100, 32).astype(np.float32) + 5  # shifted
>>> detector = DriftKNeighbors(k=5).fit(ref)
>>> result = detector.predict(test)
>>> print(f"Drift: {result.drifted}")
Drift: ...

Chunked mode:

>>> detector = DriftKNeighbors(k=5).fit(ref, chunk_size=50)
>>> result = detector.predict(test)
fit(x_ref, chunker=None, chunk_size=None, chunk_count=None, chunks=None, chunk_indices=None, threshold=None)

Fit the k-NN drift detector on reference data.

Parameters:
x_ref : ArrayLike

Reference data with dim[n_samples, n_features].

chunker : BaseChunker or None, default None

Explicit chunker instance for chunked mode.

chunk_size : int or None, default None

Create fixed-size chunks.

chunk_count : int or None, default None

Split into this many equal chunks.

chunks : list[ArrayLike] or None, default None

Pre-split reference data for chunked mode.

chunk_indices : list[list[int]] or None, default None

Index groupings for chunking reference data.

threshold : Threshold or None, default None

Threshold strategy for chunked mode. Defaults to ZScoreThreshold.

Return type:

Self

predict(x=None, chunks=None, chunk_indices=None)

Predict whether test data has drifted from reference data.

Parameters:
x : ArrayLike or None

Test data.

chunks : list[ArrayLike] or None, default None

Pre-built test data chunks.

chunk_indices : list[list[int]] or None, default None

Index groupings for chunking test data.

Returns:

Non-chunked mode: details is a DriftKNeighborsStats TypedDict. Chunked mode: details is a polars.DataFrame with per-chunk results.

Return type:

DriftOutput

property is_chunked : bool

Whether the detector is operating in chunked mode.

property x_ref : numpy.typing.NDArray[numpy.float32]

Reference data for drift detection.

Returns:

Reference data array.

Return type:

NDArray[np.float32]

Raises:

RuntimeError – If called before fit().

Classes

Config

Configuration for DriftKNeighbors detector.