dataeval.shift.DriftReconstruction¶
-
class dataeval.shift.DriftReconstruction(model, device=
None, model_type='auto', use_gmm=None, p_val=None, config=None)¶ Reconstruction-based drift detector using autoencoders.
Detects drift by comparing reconstruction errors: if the model (trained on reference data) produces higher reconstruction errors on test data, the test distribution has likely shifted.
Uses a fit/predict lifecycle: construct with model and hyperparameters, call
fit()with reference data (trains the model), then callpredict()with test data.Supports two modes:
Non-chunked (default): Computes mean reconstruction error for the test set and uses a z-test against the reference baseline.
Chunked: Splits data into chunks, computes mean reconstruction error per chunk, and uses threshold bounds to flag drift.
- Parameters:¶
- model : torch.nn.Module¶
Autoencoder or VAE model.
- device : DeviceLike or None, default None¶
Hardware device.
- model_type : {"ae", "vae", "auto"} or None, default "auto"¶
Model type.
"auto"auto-detects.- use_gmm : bool or None, default None¶
Whether to use GMM in latent space.
- p_val : float, default 0.05¶
Significance threshold for non-chunked mode.
- config : DriftReconstruction.Config or None, default None¶
Optional configuration object.
Examples
>>> from dataeval.utils.models import AE >>> import torch >>> model = AE(input_shape=(1, 28, 28)) >>> ref = torch.rand(100, 1, 28, 28).numpy() >>> detector = DriftReconstruction(model).fit(ref) >>> test = torch.rand(50, 1, 28, 28).numpy() >>> result = detector.predict(test)-
fit(x_ref, loss_fn=
None, optimizer=None, epochs=None, batch_size=None, chunker=None, chunk_size=None, chunk_count=None, chunks=None, chunk_indices=None, threshold=None)¶ Fit the reconstruction drift detector.
Trains the autoencoder on reference data, then optionally sets up chunked baseline.
- Parameters:¶
- x_ref : ArrayLike¶
Reference data.
- loss_fn : Callable or None, default None¶
Loss function for training.
- optimizer : torch.optim.Optimizer or None, default None¶
Optimizer for training.
- epochs : int or None, default None¶
Number of training epochs.
- batch_size : int or None, default None¶
Batch size for training.
- chunker : BaseChunker or None, default None¶
Explicit chunker instance for chunked mode.
- chunk_size : int or None, default None¶
Create fixed-size chunks.
- chunk_count : int or None, default None¶
Split into this many equal chunks.
- chunks : list[ArrayLike] or None, default None¶
Pre-split reference data for chunked mode.
- chunk_indices : list[list[int]] or None, default None¶
Index groupings for chunking reference data.
- threshold : Threshold or None, default None¶
Threshold strategy for chunked mode.
- Return type:¶
Self
-
predict(x=
None, chunks=None, chunk_indices=None)¶ Predict whether test data has drifted from reference data.
- property is_chunked : bool¶
Whether the detector is operating in chunked mode.
Classes¶
Configuration for DriftReconstruction detector. |