dataeval.detectors.drift.DriftKS¶

class dataeval.detectors.drift.DriftKS(x_ref, p_val=0.05, x_ref_preprocessed=False, update_x_ref=None, preprocess_fn=None, correction='bonferroni', alternative='two-sided', n_features=None)¶

Drift detector employing the Kolmogorov-Smirnov (KS) distribution test.

The KS test detects changes in the maximum distance between two data distributions with Bonferroni or False Discovery Rate (FDR) correction for multivariate data.

Parameters:¶

x_ref : ArrayLike¶: Data used as reference distribution.
p_val : float | None, default 0.05¶: p-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.
x_ref_preprocessed : bool, default False¶: Whether the given reference data x_ref has been preprocessed yet. If True, only the test data x will be preprocessed at prediction time. If False, the reference data will also be preprocessed.
update_x_ref : UpdateStrategy | None, default None¶: Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.
preprocess_fn : Callable | None, default None¶: Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.
correction : "bonferroni" | "fdr", default "bonferroni"¶: Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).
alternative : "two-sided" | "less" | "greater", default "two-sided"¶: Defines the alternative hypothesis. Options are ‘two-sided’, ‘less’ or ‘greater’.
n_features : int | None, default None¶: Number of features used in the statistical test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.

Example

>>> from functools import partial
>>> from dataeval.detectors.drift import preprocess_drift

Use a preprocess function to encode images before testing for drift

>>> preprocess_fn = partial(preprocess_drift, model=encoder, batch_size=64)
>>> drift = DriftKS(train_images, preprocess_fn=preprocess_fn)

Test incoming images for drift

>>> drift.predict(test_images).drifted
True

predict(x)¶

Predict whether a batch of data has drifted from the reference data and update reference data using specified update strategy.

Parameters:¶

x : ArrayLike¶: Batch of instances.

Returns:¶

Dictionary containing the drift prediction and optionally the feature level p-values, threshold after multivariate correction if needed and test statistics.

Return type:¶

DriftOutput

score(x)¶

Calculates p-values and test statistics per feature.

Parameters:¶

x : ArrayLike¶: Batch of instances

Returns:¶

Feature level p-values and test statistics

Return type:¶

tuple[NDArray, NDArray]

property n_features : int¶

Get the number of features in the reference data.

If the number of features is not provided during initialization, it will be inferred from the reference data (x_ref). If a preprocessing function is provided, the number of features will be inferred after applying the preprocessing function.

Returns:¶: Number of features in the reference data.
Return type:¶: int

property x_ref : dataeval.typing.ArrayLike¶

Retrieve the reference data, applying preprocessing if not already done.

Returns:¶: The reference dataset (x_ref), preprocessed if needed.
Return type:¶: ArrayLike