dataeval.shift.DriftKNeighbors

class dataeval.shift.DriftKNeighbors(k=None, distance_metric=None, p_val=None, extractor=None, update_strategy=None, config=None)

K-Nearest Neighbors based drift detector.

Detects drift by comparing k-NN distances of test samples against the reference set. If test samples are farther from their k nearest neighbors in the reference set than expected, drift is detected.

Uses a fit/predict lifecycle: construct with hyperparameters, call fit() with reference data, then call predict() with test data. Use chunked() to create a chunked wrapper for time-series monitoring.

Supports two modes:

  • Non-chunked (default): Computes per-sample k-NN distances for the test set and uses a Mann-Whitney U test against the reference baseline to produce a p-value. Drift is flagged when p_val < p_val_threshold.

  • Chunked (via chunked()): Splits data into chunks, computes mean k-NN distance per chunk, and uses threshold bounds to flag drift per chunk.

Parameters:
k : int, default 10

Number of nearest neighbors.

distance_metric : {"cosine", "euclidean"}, default "euclidean"

Distance metric for neighbor search.

p_val : float, default 0.05

Significance threshold for non-chunked mode.

extractor : FeatureExtractor or None, default None

Feature extractor for transforming input data before drift detection. When provided, raw data is passed through the extractor before flattening and comparison. When None, data is used as-is.

update_strategy : UpdateStrategy or None, default None

Strategy for updating reference data when new data arrives. When None, reference data remains fixed throughout detection.

config : DriftKNeighbors.Config or None, default None

Optional configuration object.

See also

DriftKNeighbors.Stats

Per-prediction statistics returned in DriftOutput.details.

Examples

Non-chunked mode:

>>> ref = np.random.randn(200, 32).astype(np.float32)
>>> test = np.random.randn(100, 32).astype(np.float32) + 5  # shifted
>>> detector = DriftKNeighbors(k=5).fit(ref)
>>> result = detector.predict(test)
>>> print(f"Drift: {result.drifted}")
Drift: ...

Chunked mode:

>>> chunked = DriftKNeighbors(k=5).chunked(chunk_size=50)
>>> chunked.fit(ref)
ChunkedDrift(DriftKNeighbors(k=5, distance_metric='euclidean', p_val=0.05, extractor=None, update_strategy=None), chunker=SizeChunker(chunk_size=50, incomplete='keep'), fitted=True)
>>> result = chunked.predict(test)
chunked(chunker=None, chunk_size=None, chunk_count=None, threshold=None)

Create a chunked wrapper around this drift detector.

Returns a ChunkedDrift that splits data into chunks during fit and predict, computing per-chunk metrics and comparing against baseline thresholds.

Parameters:
chunker : BaseChunker or None, default None

Explicit chunker instance.

chunk_size : int or None, default None

Create fixed-size chunks of this many samples.

chunk_count : int or None, default None

Split into this many equal chunks.

threshold : Threshold or None, default None

Threshold strategy for determining drift bounds from baseline. When None, uses the detector’s default threshold.

Returns:

A chunked drift wrapper around this detector.

Return type:

ChunkedDrift[TDetails]

fit(reference_data)

Fit the k-NN drift detector on reference data.

Parameters:
reference_data : Any

Reference data. When an extractor is configured, this can be any data type accepted by the extractor (e.g., a dataset or raw images). Otherwise, must be array-like with shape (n_samples, n_features).

Return type:

Self

predict(data)

Predict whether test data has drifted from reference data.

Parameters:
data : Any

Test data. When an extractor is configured, this can be any data type accepted by the extractor. Otherwise, must be array-like.

Returns:

Drift prediction with k-NN statistics.

Return type:

DriftOutput[DriftKNeighbors.Stats]

property reference_data : numpy.typing.NDArray[numpy.float32]

Reference data, lazily encoded on first access.

Overrides BaseDrift.reference_data via MRO when this mixin appears before BaseDrift in the inheritance list.

Classes

Config

Configuration for DriftKNeighbors detector.

Stats

Statistics from K-Nearest Neighbors drift detection.