Drift Detection

Drift refers to the phenomenon where the statistical properties of the data change over time. It occurs when the underlying distribution of the input features or the target variable (what the model is trying to predict) shifts, leading to a discrepancy between the training data and the real-world data the model encounters during deployment.

Through concepts examined in the NeurIPS 2019 paper Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift, we can utilize various methods in order to determine if drift is detected. For high-dimensional data, we typically want to reduce the dimensionality before performing tests against the dataset. To do so, we incorporate Untrained AutoEncoders (UAE) and Black-Box Shift Estimation (BBSE) predictors using the classifier’s softmax outputs as out-of-the box preprocessing methods and note that Principal Component Analysis can also be easily implemented using scikit-learn. Preprocessing methods which do not rely on the classifier will usually pick up drift in the input data, while BBSE focuses on label shift.

Tutorials

Check out this tutorial to begin using the Drift Detection class

Drift Detection Tutorial

DataEval API

Cramér-von Mises

The CVM drift detector is a non-parametric drift detector, which applies feature-wise two-sample Cramér-von Mises (CVM) tests. For two empirical distributions $F(z)$ and $F_{ref}(z)$, the CVM test statistic is defined as

$$ W = \sum_{z\in k} \left| F(z) - F_{ref}(z) \right|^2 $$

where $k$ is the joint sample. The CVM test is an alternative to the Kolmogorov-Smirnov (K-S) two-sample test, which uses the maximum distance between two empirical distributions $F(z)$ and $F_{ref}(z)$. By using the full joint sample, the CVM can exhibit greater power against shifts in higher moments, such as variance changes.

For multivariate data, the detector applies a separate CVM test to each feature, and the p-values obtained for each feature are aggregated either via the Bonferroni or the False Discovery Rate (FDR) correction. The Bonferroni correction is more conservative and controls for the probability of at least one false positive. The FDR correction on the other hand allows for an expected fraction of false positives to occur. As with other univariate detectors such as the Kolmogorov-Smirnov detector, for high-dimensional data, we typically want to reduce the dimensionality before computing the feature-wise univariate FET tests and aggregating those via the chosen correction method.

class dataeval.detectors.DriftCVM(x_ref: ndarray, p_val: float = 0.05, x_ref_preprocessed: bool = False, update_x_ref: UpdateStrategy | None = None, preprocess_fn: Callable[[ndarray], ndarray] | None = None, correction: Literal['bonferroni', 'fdr'] = 'bonferroni', n_features: int | None = None)

Cramér-von Mises (CVM) data drift detector, which tests for any change in the distribution of continuous univariate data. For multivariate data, a separate CVM test is applied to each feature, and the obtained p-values are aggregated via the Bonferroni or False Discovery Rate (FDR) corrections.

Parameters:

x_ref (np.ndarray) – Data used as reference distribution.
p_val (float, default 0.05) – p-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.
x_ref_preprocessed (bool, default False) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.
update_x_ref (Optional[UpdateStrategy], default None) – Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with dataeval.detectors.LastSeenUpdateStrategy or via reservoir sampling with dataeval.detectors.ReservoirSamplingUpdateStrategy.
preprocess_fn (Optional[Callable[[np.ndarray], np.ndarray]], default None) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.
correction (Literal["bonferroni", "fdr"], default "bonferroni") – Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).
n_features – Number of features used in the statistical test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.

predict(x: ndarray, drift_type: Literal['batch', 'feature'] = 'batch') → Dict[str, int | float | ndarray]

Predict whether a batch of data has drifted from the reference data and update reference data using specified update strategy.

Parameters:

x (np.ndarray) – Batch of instances.
drift_type (Literal["batch", "feature"], default "batch") – Predict drift at the ‘feature’ or ‘batch’ level. For ‘batch’, the test statistics for each feature are aggregated using the Bonferroni or False Discovery Rate correction (if n_features>1).

Returns:

p-values, threshold after multivariate correction if needed and test statistics.

Return type:

Dictionary containing the drift prediction and optionally the feature level

score(x: ndarray) → Tuple[ndarray, ndarray]

Performs the two-sample Cramér-von Mises test(s), computing the p-value and test statistic per feature.

Parameters:: x – Batch of instances.
Return type:: Feature level p-values and CVM statistics.

Kolmogorov-Smirnov

The drift detector applies feature-wise two-sample Kolmogorov-Smirnov (K-S) tests. For multivariate data, the obtained p-values for each feature are aggregated either via the Bonferroni or the False Discovery Rate (FDR) correction. The Bonferroni correction is more conservative and controls for the probability of at least one false positive. The FDR correction on the other hand allows for an expected fraction of false positives to occur.

class dataeval.detectors.DriftKS(x_ref: ndarray, p_val: float = 0.05, x_ref_preprocessed: bool = False, update_x_ref: UpdateStrategy | None = None, preprocess_fn: Callable[[ndarray], ndarray] | None = None, correction: Literal['bonferroni', 'fdr'] = 'bonferroni', alternative: Literal['two-sided', 'less', 'greater'] = 'two-sided', n_features: int | None = None)

Kolmogorov-Smirnov (K-S) data drift detector with Bonferroni or False Discovery Rate (FDR) correction for multivariate data.

Parameters:

x_ref (np.ndarray) – Data used as reference distribution.
p_val (float, default 0.05) – p-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.
x_ref_preprocessed (bool, default False) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.
update_x_ref (Optional[UpdateStrategy], default None) – Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with dataeval.detectors.LastSeenUpdateStrategy or via reservoir sampling with dataeval.detectors.ReservoirSamplingUpdateStrategy.
preprocess_fn (Optional[Callable[[np.ndarray], np.ndarray]], default None) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.
correction (Literal["bonferroni", "fdr"], default "bonferroni") – Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).
alternative (Literal["two-sided", "less", "greater"], default "two-sided") – Defines the alternative hypothesis. Options are ‘two-sided’, ‘less’ or ‘greater’.
n_features – Number of features used in the statistical test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.

predict(x: ndarray, drift_type: Literal['batch', 'feature'] = 'batch') → Dict[str, int | float | ndarray]

Predict whether a batch of data has drifted from the reference data and update reference data using specified update strategy.

Parameters:

x (np.ndarray) – Batch of instances.
drift_type (Literal["batch", "feature"], default "batch") – Predict drift at the ‘feature’ or ‘batch’ level. For ‘batch’, the test statistics for each feature are aggregated using the Bonferroni or False Discovery Rate correction (if n_features>1).

Returns:

p-values, threshold after multivariate correction if needed and test statistics.

Return type:

Dictionary containing the drift prediction and optionally the feature level

score(x: ndarray) → Tuple[ndarray, ndarray]

Compute K-S scores and statistics per feature.

Parameters:: x – Batch of instances.
Return type:: Feature level p-values and K-S statistics.

Maximum Mean Discrepancy

The Maximum Mean Discrepancy (MMD) detector is a kernel-based method for multivariate 2 sample testing. The MMD is a distance-based measure between 2 distributions p and q based on the mean embeddings $\mu_{p}$ and $\mu_{q}$ in a reproducing kernel Hilbert space $F$:

$$ MMD(F, p, q) = || \mu_{p} - \mu_{q} ||^2_{F} $$

We can compute unbiased estimates of $MMD^2$ from the samples of the 2 distributions after applying the kernel trick. We use by default a radial basis function kernel, but users are free to pass their own kernel of preference to the detector. We obtain a $p$-value via a permutation test on the values of $MMD^2$.

class dataeval.detectors.DriftMMD(x_ref: ~numpy.ndarray, p_val: float = 0.05, x_ref_preprocessed: bool = False, update_x_ref: ~dataeval._internal.detectors.drift.base.UpdateStrategy | None = None, preprocess_fn: ~typing.Callable[[~numpy.ndarray], ~numpy.ndarray] | None = None, kernel: ~typing.Callable = <class 'dataeval.detectors.GaussianRBF'>, sigma: ~numpy.ndarray | None = None, configure_kernel_from_x_ref: bool = True, n_permutations: int = 100, device: str | None = None)

Maximum Mean Discrepancy (MMD) data drift detector using a permutation test.

Parameters:

x_ref (np.ndarray) – Data used as reference distribution.
p_val (float, default 0.05) – p-value used for the significance of the permutation test.
x_ref_preprocessed (bool, default False) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.
preprocess_at_init (bool, default True) – Whether to preprocess the reference data when the detector is instantiated. Otherwise, the reference data will be preprocessed at prediction time. Only applies if x_ref_preprocessed=False.
update_x_ref (Optional[UpdateStrategy], default None) – Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with dataeval.detectors.LastSeenUpdateStrategy or via reservoir sampling with dataeval.detectors.ReservoirSamplingUpdateStrategy.
preprocess_fn (Optional[Callable], default None) – Function to preprocess the data before computing the data drift metrics.
kernel (Callable, default dataeval.detectors.GaussianRBF) – Kernel used for the MMD computation, defaults to Gaussian RBF kernel.
sigma (Optional[np.ndarray], default None) – Optionally set the GaussianRBF kernel bandwidth. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths.
configure_kernel_from_x_ref (bool, default True) – Whether to already configure the kernel bandwidth from the reference data.
n_permutations (int, default 100) – Number of permutations used in the permutation test.
device (Optional[str], default None) – Device type used. The default None uses the GPU and falls back on CPU. Can be specified by passing either ‘cuda’, ‘gpu’ or ‘cpu’.

predict(x: ndarray) → Dict[str, int | float]

Predict whether a batch of data has drifted from the reference data and then updates reference data using specified strategy.

Parameters:: x – Batch of instances.
Return type:: Dictionary containing the drift prediction, p-value, threshold and MMD metric.

score(x: ndarray) → Tuple[float, float, float]

Compute the p-value resulting from a permutation test using the maximum mean discrepancy as a distance measure between the reference data and the data to be tested.

Parameters:

x – Batch of instances.

Returns:

p-value obtained from the permutation test, the MMD^2 between the reference and
test set, and the MMD^2 threshold above which drift is flagged.

Classifier Uncertainty

The classifier uncertainty drift detector aims to directly detect drift that is likely to effect the performance of a model of interest. The approach is to test for change in the number of instances falling into regions of the input space on which the model is uncertain in its predictions. For each instance in the reference set the detector obtains the model’s prediction and some associated notion of uncertainty. The same is done for the test set and if significant differences in uncertainty are detected (via a Kolmogorov-Smirnov test) then drift is flagged. The detector’s reference set should be disjoint from the model’s training set (on which the model’s confidence may be higher).

class dataeval.detectors.DriftUncertainty(x_ref: ndarray, model: Callable, p_val: float = 0.05, x_ref_preprocessed: bool = False, update_x_ref: UpdateStrategy | None = None, preds_type: Literal['probs', 'logits'] = 'probs', batch_size: int = 32, preprocess_batch_fn: Callable | None = None, device: str | None = None)

Test for a change in the number of instances falling into regions on which the model is uncertain. Performs a K-S test on prediction entropies.

Parameters:

x_ref (np.ndarray) – Data used as reference distribution. Should be disjoint from the data the model was trained on for accurate p-values.
model (Callable) – Classification model outputting class probabilities (or logits)
p_val (float, default 0.05) – p-value used for the significance of the test.
x_ref_preprocessed (bool, default False) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.
update_x_ref (Optional[UpdateStrategy], default None) – Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with dataeval.detectors.LastSeenUpdateStrategy or via reservoir sampling with dataeval.detectors.ReservoirSamplingUpdateStrategy.
preds_type (Literal["probs", "logits"], default "logits") – Type of prediction output by the model. Options are ‘probs’ (in [0,1]) or ‘logits’ (in [-inf,inf]).
batch_size (int, default 32) – Batch size used to evaluate model. Only relevant when backend has been specified for batch prediction.
preprocess_batch_fn (Optional[Callable], default None) – Optional batch preprocessing function. For example to convert a list of objects to a batch which can be processed by the model.
device (Optional[str], default None) – Device type used. The default None tries to use the GPU and falls back on CPU if needed. Can be specified by passing either ‘cuda’, ‘gpu’ or ‘cpu’.
input_shape (Optional[tuple], default None) – Shape of input data.

predict(x: ndarray) → Dict[str, int | float | ndarray]

Predict whether a batch of data has drifted from the reference data.

Parameters:: x – Batch of instances.
Return type:: Dictionary containing the drift prediction, p-value, and threshold statistics.

GaussianRBF

The GaussianRBF class implements a Gaussian kernel, also known as a radial basis function (RBF) kernel. It is used to construct a covariance matrix for gaussian processes and is the default kernel used in the MMD drift detection test.

class dataeval.detectors.GaussianRBF(sigma: Tensor | None = None, init_sigma_fn: Callable | None = None, trainable: bool = False)

Gaussian RBF kernel: k(x,y) = exp(-(1/(2*sigma^2)||x-y||^2). A forward pass takes a batch of instances x [Nx, features] and y [Ny, features] and returns the kernel matrix [Nx, Ny].

Parameters:

sigma (Optional[torch.Tensor], default None) – Bandwidth used for the kernel. Needn’t be specified if being inferred or trained. Can pass multiple values to eval kernel with and then average.
init_sigma_fn (Optional[Callable], default None) – Function used to compute the bandwidth sigma. Used when sigma is to be inferred. The function’s signature should take in the tensors x, y and dist and return sigma. If None, it is set to sigma_median().
trainable (bool, default False) – Whether or not to track gradients w.r.t. sigma to allow it to be trained.

forward(x: ndarray | Tensor, y: ndarray | Tensor, infer_sigma: bool = False) → Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

LastSeenUpdate

class dataeval.detectors.LastSeenUpdate(n: int)

Updates reference dataset for drift detector using last seen method.

Parameters:: n (int) – Update with last n instances seen by the detector.

ReservoirSamplingUpdate

class dataeval.detectors.ReservoirSamplingUpdate(n: int)

Updates reference dataset for drift detector using reservoir sampling method.

Parameters:: n (int) – Update with reservoir sampling of size n.