dataeval.detectors.drift.DriftKS¶
-
class dataeval.detectors.drift.DriftKS(data, p_val=
0.05, update_strategy=None, correction='bonferroni', alternative='two-sided', n_features=None)¶ Drift detector employing the Kolmogorov-Smirnov (KS) distribution test.
The KS test detects changes in the maximum distance between two data distributions with Bonferroni or False Discovery Rate (FDR) correction for multivariate data.
- Parameters:¶
- data : Embeddings or Array¶
Data used as reference distribution.
- p_val : float or None, default 0.05¶
p-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.
- update_strategy : UpdateStrategy or None, default None¶
Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.
- correction : "bonferroni" or "fdr", default "bonferroni"¶
Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).
- alternative : "two-sided", "less" or "greater", default "two-sided"¶
Defines the alternative hypothesis. Options are ‘two-sided’, ‘less’ or ‘greater’.
- n_features : int | None, default None¶
Number of features used in the univariate drift tests. If not provided, it will be inferred from the data.
Example
>>> from dataeval.utils.data import EmbeddingsUse Embeddings to encode images before testing for drift
>>> train_emb = Embeddings(train_images, model=encoder, batch_size=64) >>> drift = DriftKS(train_emb)Test incoming images for drift
>>> drift.predict(test_images).drifted True- predict(data)¶
Predict whether a batch of data has drifted from the reference data and update reference data using specified update strategy.
- Parameters:¶
- data : Embeddings or Array¶
Batch of instances to predict drift on.
- Returns:¶
Dictionary containing the drift prediction and optionally the feature level p-values, threshold after multivariate correction if needed and test statistics.
- Return type:¶
- property n_features : int¶
Get the number of features in the reference data.
If the number of features is not provided during initialization, it will be inferred from the reference data (
x_ref).