DriftMMD#

class dataeval.detectors.drift.DriftMMD(x_ref: ArrayLike, p_val: float = 0.05, x_ref_preprocessed: bool = False, update_x_ref: UpdateStrategy | None = None, preprocess_fn: Callable[[ArrayLike], ArrayLike] | None = None, kernel: Callable = <class 'dataeval._internal.detectors.drift.torch.GaussianRBF'>, sigma: ArrayLike | None = None, configure_kernel_from_x_ref: bool = True, n_permutations: int = 100, device: str | None = None)#

Maximum Mean Discrepancy (MMD) Drift Detection algorithm using a permutation test.

Parameters:
  • x_ref (ArrayLike) – Data used as reference distribution.

  • p_val (float | None, default 0.05) – P-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.

  • x_ref_preprocessed (bool, default False) – Whether the given reference data x_ref has been preprocessed yet. If True, only the test data x will be preprocessed at prediction time. If False, the reference data will also be preprocessed.

  • update_x_ref (UpdateStrategy | None, default None) – Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.

  • preprocess_fn (Callable | None, default None) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.

  • kernel (Callable, default GaussianRBF) – Kernel used for the MMD computation, defaults to Gaussian RBF kernel.

  • sigma (ArrayLike | None, default None) – Optionally set the GaussianRBF kernel bandwidth. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths.

  • configure_kernel_from_x_ref (bool, default True) – Whether to already configure the kernel bandwidth from the reference data.

  • n_permutations (int, default 100) – Number of permutations used in the permutation test.

  • device (str | None, default None) – Device type used. The default None uses the GPU and falls back on CPU. Can be specified by passing either ‘cuda’, ‘gpu’ or ‘cpu’.

predict(x: ArrayLike) DriftMMDOutput#

Predict whether a batch of data has drifted from the reference data and then updates reference data using specified strategy.

Parameters:

x (ArrayLike) – Batch of instances.

Returns:

Output class containing the drift prediction, p-value, threshold and MMD metric.

Return type:

DriftMMDOutput

score(x: ArrayLike) tuple[float, float, float]#

Compute the p-value resulting from a permutation test using the maximum mean discrepancy as a distance measure between the reference data and the data to be tested.

Parameters:

x (ArrayLike) – Batch of instances.

Returns:

p-value obtained from the permutation test, MMD^2 between the reference and test set, and MMD^2 threshold above which drift is flagged

Return type:

tuple(float, float, float)

property x_ref: ndarray[Any, dtype[_ScalarType_co]]#

Retrieve the reference data, applying preprocessing if not already done.

Returns:

The reference dataset (x_ref), preprocessed if needed.

Return type:

NDArray