dataeval.detectors.drift.DriftCVM¶

class dataeval.detectors.drift.DriftCVM(data, p_val=0.05, update_strategy=None, correction='bonferroni', n_features=None)¶

Drift detector employing the Cramér-von Mises (CVM) Drift Detection test.

The CVM test detects changes in the distribution of continuous univariate data. For multivariate data, a separate CVM test is applied to each feature, and the obtained p-values are aggregated via the Bonferroni or False Discovery Rate (FDR) corrections.

Parameters:¶

data : Embeddings or Array¶: Data used as reference distribution.
p_val : float or None, default 0.05¶: p-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.
update_strategy : UpdateStrategy or None, default None¶: Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.
correction : "bonferroni" or "fdr", default "bonferroni"¶: Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).
n_features : int or None, default None¶: Number of features used in the univariate drift tests. If not provided, it will be inferred from the data.

Example

>>> from dataeval.data import Embeddings

Use Embeddings to encode images before testing for drift

>>> train_emb = Embeddings(train_images, model=encoder, batch_size=64)
>>> drift = DriftCVM(train_emb)

Test incoming images for drift

>>> drift.predict(test_images).drifted
True

predict(data)¶

Predict whether a batch of data has drifted from the reference data and update reference data using specified update strategy.

Parameters:¶

data : Embeddings or Array¶: Batch of instances to predict drift on.

Returns:¶

Dictionary containing the drift prediction and optionally the feature level p-values, threshold after multivariate correction if needed and test statistics.

Return type:¶

DriftOutput

score(data)¶

Calculates p-values and test statistics per feature.

Parameters:¶

data : Embeddings or Array¶: Batch of instances to score.

Returns:¶

Feature level p-values and test statistics

Return type:¶

tuple[NDArray, NDArray]

property n_features : int¶

Get the number of features in the reference data.

If the number of features is not provided during initialization, it will be inferred from the reference data (x_ref).

Returns:¶: Number of features in the reference data.
Return type:¶: int

property x_ref : numpy.typing.NDArray[numpy.float32]¶

Retrieve the reference data of the drift detector.

Returns:¶: The reference data as a 32-bit floating point numpy array.
Return type:¶: NDArray[np.float32]