dataeval.detectors.drift.DriftMMD¶
-
class dataeval.detectors.drift.DriftMMD(data, p_val=
0.05, update_strategy=None, sigma=None, n_permutations=100, device=None)¶ Maximum Mean Discrepancy (MMD) Drift Detection algorithm using a permutation test.
- Parameters:¶
- data : Embeddings or Array¶
Data used as reference distribution.
- p_val : float or None, default 0.05¶
P-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.
- update_strategy : UpdateStrategy or None, default None¶
Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.
- sigma : Array or None, default None¶
Optionally set the internal GaussianRBF kernel bandwidth. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths.
- n_permutations : int, default 100¶
Number of permutations used in the permutation test.
- device : DeviceLike or None, default None¶
The hardware device to use if specified, otherwise uses the DataEval default or torch default.
Example
>>> from dataeval.data import EmbeddingsUse Embeddings to encode images before testing for drift
>>> train_emb = Embeddings(train_images, model=encoder, batch_size=64) >>> drift = DriftMMD(train_emb)Test incoming images for drift
>>> drift.predict(test_images).drifted True- predict(data)¶
Predict whether a batch of data has drifted from the reference data and then updates reference data using specified strategy.