dataeval.detectors.drift.DriftKS

class dataeval.detectors.drift.DriftKS(x_ref, p_val=0.05, x_ref_preprocessed=False, update_x_ref=None, preprocess_fn=None, correction='bonferroni', alternative='two-sided', n_features=None)

Drift detector employing the Kolmogorov-Smirnov (KS) distribution test.

The KS test detects changes in the maximum distance between two data distributions with Bonferroni or False Discovery Rate (FDR) correction for multivariate data.

Parameters:
x_ref : ArrayLike

Data used as reference distribution.

p_val : float | None, default 0.05

p-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.

x_ref_preprocessed : bool, default False

Whether the given reference data x_ref has been preprocessed yet. If True, only the test data x will be preprocessed at prediction time. If False, the reference data will also be preprocessed.

update_x_ref : UpdateStrategy | None, default None

Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.

preprocess_fn : Callable | None, default None

Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.

correction : "bonferroni" | "fdr", default "bonferroni"

Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).

alternative : "two-sided" | "less" | "greater", default "two-sided"

Defines the alternative hypothesis. Options are ‘two-sided’, ‘less’ or ‘greater’.

n_features : int | None, default None

Number of features used in the statistical test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.

score(x)

Compute KS scores and :term:Statistics` per feature.

Parameters:
x : ArrayLike

Batch of instances.

Returns:

Feature level :term:p-values and KS statistic

Return type:

tuple[NDArray, NDArray]