dataeval.detectors.drift.DriftUncertainty

class dataeval.detectors.drift.DriftUncertainty(data, model, p_val=0.05, update_strategy=None, correction='bonferroni', preds_type='probs', batch_size=32, transforms=None, device=None)

Test for a change in the number of instances falling into regions on which the model is uncertain.

Performs a K-S test on prediction entropies.

Parameters:
data : Array

Data used as reference distribution.

model : Callable

Classification model outputting class probabilities (or logits)

p_val : float, default 0.05

P-Value used for the significance of the test.

update_strategy : UpdateStrategy or None, default None

Reference data can optionally be updated using an UpdateStrategy class. Update using the last n instances seen by the detector with LastSeenUpdateStrategy or via reservoir sampling with ReservoirSamplingUpdateStrategy.

correction : "bonferroni" or "fdr", default "bonferroni"

Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).

preds_type : "probs" or "logits", default "probs"

Type of prediction output by the model. Options are ‘probs’ (in [0,1]) or ‘logits’ (in [-inf,inf]).

batch_size : int, default 32

Batch size used to evaluate model. Only relevant when backend has been specified for batch prediction.

transforms : Transform, Sequence[Transform] or None, default None

Transform(s) to apply to the data.

device : DeviceLike or None, default None

Device type used. The default None tries to use the GPU and falls back on CPU if needed. Can be specified by passing either ‘cuda’ or ‘cpu’.

Example

>>> model = ClassificationModel()
>>> drift = DriftUncertainty(x_ref, model=model, batch_size=20)

Verify reference images have not drifted

>>> drift.predict(x_ref.copy()).drifted
False

Test incoming images for drift

>>> drift.predict(x_test).drifted
True
predict(x)

Predict whether a batch of data has drifted from the reference data.

Parameters:
x : Array

Batch of instances.

Returns:

Dictionary containing the drift prediction, p-value, and threshold statistics.

Return type:

DriftUnvariateOutput

property x_ref : numpy.typing.NDArray[numpy.float32]

Retrieve the reference data of the drift detector.

Returns:

The reference data as a 32-bit floating point numpy array.

Return type:

NDArray[np.float32]