DriftKS#
- class dataeval.detectors.drift.DriftKS(x_ref: ArrayLike, p_val: float = 0.05, x_ref_preprocessed: bool = False, update_x_ref: UpdateStrategy | None = None, preprocess_fn: Callable[[ArrayLike], ArrayLike] | None = None, correction: Literal['bonferroni', 'fdr'] = 'bonferroni', alternative: Literal['two-sided', 'less', 'greater'] = 'two-sided', n_features: int | None = None)#
Drift detector employing the Kolmogorov-Smirnov (KS) distribution test.
The KS test detects changes in the maximum distance between two data distributions with Bonferroni or False Discovery Rate (FDR) correction for multivariate data.
- Parameters:
x_ref (ArrayLike) – Data used as reference distribution.
p_val (float | None, default 0.05) – p-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.
x_ref_preprocessed (bool, default False) – Whether the given reference data
x_refhas been preprocessed yet. IfTrue, only the test dataxwill be preprocessed at prediction time. IfFalse, the reference data will also be preprocessed.update_x_ref (UpdateStrategy | None, default None) – Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.
preprocess_fn (Callable | None, default None) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.
correction ("bonferroni" | "fdr", default "bonferroni") – Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).
alternative ("two-sided" | "less" | "greater", default "two-sided") – Defines the alternative hypothesis. Options are ‘two-sided’, ‘less’ or ‘greater’.
n_features (int | None, default None) – Number of features used in the statistical test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.
- property n_features: int#
Get the number of features in the reference data.
If the number of features is not provided during initialization, it will be inferred from the reference data (
x_ref). If a preprocessing function is provided, the number of features will be inferred after applying the preprocessing function.- Returns:
Number of features in the reference data.
- Return type:
int
- predict(x: ArrayLike) DriftOutput#
Predict whether a batch of data has drifted from the reference data and update reference data using specified update strategy.
- Parameters:
x (ArrayLike) – Batch of instances.
- Returns:
Dictionary containing the drift prediction and optionally the feature level p-values, threshold after multivariate correction if needed and test statistics.
- Return type:
- score(x: ArrayLike) tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float32]]]#
Compute KS scores and :term:Statistics` per feature.
- Parameters:
x (ArrayLike) – Batch of instances.
- Returns:
Feature level :term:p-values and KS statistic
- Return type:
tuple[NDArray, NDArray]
- property x_ref: ndarray[Any, dtype[_ScalarType_co]]#
Retrieve the reference data, applying preprocessing if not already done.
- Returns:
The reference dataset (x_ref), preprocessed if needed.
- Return type:
NDArray