dataeval.detectors.drift.DriftCVM#

class dataeval.detectors.drift.DriftCVM(x_ref, p_val=0.05, x_ref_preprocessed=False, update_x_ref=None, preprocess_fn=None, correction='bonferroni', n_features=None)#

Drift detector employing the Cramér-von Mises (CVM) Drift Detection test.

The CVM test detects changes in the distribution of continuous univariate data. For multivariate data, a separate CVM test is applied to each feature, and the obtained p-values are aggregated via the Bonferroni or False Discovery Rate (FDR) corrections.

Parameters:
  • x_ref (ArrayLike) – Data used as reference distribution.

  • p_val (float | None, default 0.05) – p-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.

  • x_ref_preprocessed (bool, default False) – Whether the given reference data x_ref has been preprocessed yet. If True, only the test data x will be preprocessed at prediction time. If False, the reference data will also be preprocessed.

  • update_x_ref (UpdateStrategy | None, default None) – Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.

  • preprocess_fn (Callable | None, default None) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.

  • correction ("bonferroni" | "fdr", default "bonferroni") – Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).

  • n_features (int | None, default None) – Number of features used in the statistical test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.

score(x)#

Performs the two-sample Cramér-von Mises test(s), computing the p-value and test statistic per feature.

Parameters:

x (ArrayLike) – Batch of instances.

Returns:

Feature level p-values and CVM statistic

Return type:

tuple[NDArray, NDArray]