dataeval.shift.DriftReconstruction

class dataeval.shift.DriftReconstruction(model, device=None, model_type='auto', use_gmm=None, p_val=None, config=None)

Reconstruction-based drift detector using autoencoders.

Detects drift by comparing reconstruction errors: if the model (trained on reference data) produces higher reconstruction errors on test data, the test distribution has likely shifted.

Uses a fit/predict lifecycle: construct with model and hyperparameters, call fit() with reference data (trains the model), then call predict() with test data. Use chunked() to create a chunked wrapper for time-series monitoring.

Supports two modes:

  • Non-chunked (default): Computes mean reconstruction error for the test set and uses a z-test against the reference baseline.

  • Chunked (via chunked()): Splits data into chunks, computes mean reconstruction error per chunk, and uses threshold bounds to flag drift.

Parameters:
model : torch.nn.Module

Autoencoder or VAE model.

device : DeviceLike or None, default None

Hardware device.

model_type : {"ae", "vae", "auto"} or None, default "auto"

Model type. "auto" auto-detects.

use_gmm : bool or None, default None

Whether to use GMM in latent space.

p_val : float, default 0.05

Significance threshold for non-chunked mode.

config : DriftReconstruction.Config or None, default None

Optional configuration object.

See also

DriftReconstruction.Stats

Per-prediction statistics returned in DriftOutput.details.

Examples

>>> from dataeval.utils.models import AE
>>> import torch
>>> model = AE(input_shape=(1, 28, 28))
>>> ref = torch.rand(100, 1, 28, 28).numpy()
>>> detector = DriftReconstruction(model).fit(ref)
>>> test = torch.rand(50, 1, 28, 28).numpy()
>>> result = detector.predict(test)
chunked(chunker=None, chunk_size=None, chunk_count=None, threshold=None)

Create a chunked wrapper around this drift detector.

Returns a ChunkedDrift that splits data into chunks during fit and predict, computing per-chunk metrics and comparing against baseline thresholds.

Parameters:
chunker : BaseChunker or None, default None

Explicit chunker instance.

chunk_size : int or None, default None

Create fixed-size chunks of this many samples.

chunk_count : int or None, default None

Split into this many equal chunks.

threshold : Threshold or None, default None

Threshold strategy for determining drift bounds from baseline. When None, uses the detector’s default threshold.

Returns:

A chunked drift wrapper around this detector.

Return type:

ChunkedDrift[TDetails]

fit(reference_data)

Fit the reconstruction drift detector.

Trains the autoencoder on reference data using parameters from Config (loss_fn, optimizer, epochs, batch_size).

Parameters:
reference_data : ArrayLike

Reference data.

Return type:

Self

predict(data)

Predict whether test data has drifted from reference data.

Parameters:
data : ArrayLike

Test data.

Returns:

Drift prediction with reconstruction error statistics.

Return type:

DriftOutput[DriftReconstruction.Stats]

property reference_data : numpy.typing.NDArray[numpy.float32]

Reference data for drift detection.

Returns:

Reference data array.

Return type:

NDArray[np.float32]

Raises:

NotFittedError – If called before fit().

Classes

Config

Configuration for DriftReconstruction detector.

Stats

Statistics from reconstruction-based drift detection.