dataeval.detectors.drift.DriftMMD#
- class dataeval.detectors.drift.DriftMMD(x_ref, p_val=0.05, x_ref_preprocessed=False, update_x_ref=None, preprocess_fn=None, sigma=None, configure_kernel_from_x_ref=True, n_permutations=100, device=None)#
Maximum Mean Discrepancy (MMD) Drift Detection algorithm using a permutation test.
- Parameters:
x_ref (ArrayLike) – Data used as reference distribution.
p_val (float | None, default 0.05) – P-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.
x_ref_preprocessed (bool, default False) – Whether the given reference data
x_refhas been preprocessed yet. IfTrue, only the test dataxwill be preprocessed at prediction time. IfFalse, the reference data will also be preprocessed.update_x_ref (UpdateStrategy | None, default None) – Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.
preprocess_fn (Callable | None, default None) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.
sigma (ArrayLike | None, default None) – Optionally set the internal GaussianRBF kernel bandwidth. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths.
configure_kernel_from_x_ref (bool, default True) – Whether to already configure the kernel bandwidth from the reference data.
n_permutations (int, default 100) – Number of permutations used in the permutation test.
device (str | None, default None) – Device type used. The default None uses the GPU and falls back on CPU. Can be specified by passing either ‘cuda’, ‘gpu’ or ‘cpu’.
- predict(x)#
Predict whether a batch of data has drifted from the reference data and then updates reference data using specified strategy.
- score(x)#
Compute the p-value resulting from a permutation test using the maximum mean discrepancy as a distance measure between the reference data and the data to be tested.
- Parameters:
x (ArrayLike) – Batch of instances.
- Returns:
p-value obtained from the permutation test, MMD^2 between the reference and test set, and MMD^2 threshold above which drift is flagged
- Return type:
tuple(float, float, float)