dataeval.detectors.drift.DriftMMD¶
-
class dataeval.detectors.drift.DriftMMD(x_ref, p_val=
0.05, x_ref_preprocessed=False, update_x_ref=None, preprocess_fn=None, sigma=None, configure_kernel_from_x_ref=True, n_permutations=100, device=None)¶ Maximum Mean Discrepancy (MMD) Drift Detection algorithm using a permutation test.
- Parameters:¶
- x_ref : ArrayLike¶
Data used as reference distribution.
- p_val : float | None, default 0.05¶
P-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.
- x_ref_preprocessed : bool, default False¶
Whether the given reference data
x_refhas been preprocessed yet. IfTrue, only the test dataxwill be preprocessed at prediction time. IfFalse, the reference data will also be preprocessed.- update_x_ref : UpdateStrategy | None, default None¶
Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.
- preprocess_fn : Callable | None, default None¶
Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.
- sigma : ArrayLike | None, default None¶
Optionally set the internal GaussianRBF kernel bandwidth. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths.
- configure_kernel_from_x_ref : bool, default True¶
Whether to already configure the kernel bandwidth from the reference data.
- n_permutations : int, default 100¶
Number of permutations used in the permutation test.
- device : str | None, default None¶
Device type used. The default None uses the GPU and falls back on CPU. Can be specified by passing either ‘cuda’, ‘gpu’ or ‘cpu’.
Example
>>> from functools import partial >>> from dataeval.detectors.drift import preprocess_driftUse a preprocess function to encode images before testing for drift
>>> preprocess_fn = partial(preprocess_drift, model=encoder, batch_size=64) >>> drift = DriftMMD(train_images, preprocess_fn=preprocess_fn)Test incoming images for drift
>>> drift.predict(test_images).drifted True- predict(x)¶
Predict whether a batch of data has drifted from the reference data and then updates reference data using specified strategy.