dataeval.detectors.drift.DriftUncertainty¶
-
class dataeval.detectors.drift.DriftUncertainty(data, model, p_val=
0.05, update_strategy=None, correction='bonferroni', preds_type='probs', batch_size=32, transforms=None, device=None)¶ Drift detector using model prediction uncertainty.
Detects drift by monitoring changes in the distribution of model prediction uncertainties (entropy) rather than input features directly. Uses Kolmogorov-Smirnov (K-S) Test to compare uncertainty distributions between reference and test data.
This approach is particularly effective for detecting drift that affects model confidence even when input features remain statistically similar, such as out-of-domain samples or adversarial examples.
- Parameters:¶
- data : Embeddings or Array¶
Reference dataset used as baseline distribution for drift detection. Should represent the expected “normal” data distribution.
- p_val : float, default 0.05¶
Significance threshold for statistical tests, between 0 and 1. For FDR correction, this represents the acceptable false discovery rate. Default 0.05 provides 95% confidence level for drift detection.
- update_strategy : UpdateStrategy or None, default None¶
Strategy for updating reference data when new data arrives. When None, reference data remains fixed throughout detection.
- correction : "bonferroni" or "fdr", default "bonferroni"¶
Multiple testing correction method for multivariate drift detection. “bonferroni” provides conservative family-wise error control by dividing significance threshold by number of features. “fdr” uses Benjamini-Hochberg procedure for less conservative control. Default “bonferroni” minimizes false positive drift detections.
- preds_type : "probs" or "logits", default "probs"¶
Format of model prediction outputs. “probs” expects normalized probabilities summing to 1. “logits” expects raw model outputs and applies softmax normalization internally. Default “probs” assumes standard classification model outputs.
- batch_size : int, default 32¶
Batch size for model inference during uncertainty computation. Larger batches improve GPU utilization but require more memory. Default 32 balances efficiency and memory usage.
- transforms : Transform, Sequence[Transform] or None, default None¶
Data transformations applied before model inference. Should match preprocessing used during model training for consistent predictions. When None, uses raw input data without preprocessing.
- device : DeviceLike or None, default None¶
Hardware device for computation. When None, automatically selects DataEval’s configured device, falling back to PyTorch’s default.
- model : torch.nn.Module¶
Example
>>> model = ClassificationModel() >>> drift_detector = DriftUncertainty(x_ref, model=model, batch_size=16)Verify reference images have not drifted
>>> result = drift_detector.predict(x_test) >>> print(f"Drift detected: {result.drifted}") Drift detected: True>>> print(f"Mean uncertainty change: {result.distance:.4f}") Mean uncertainty change: 0.8160With data preprocessing
>>> import torchvision.transforms.v2 as T >>> transforms = T.Compose([T.ToDtype(torch.float32)]) >>> drift_detector = DriftUncertainty(x_ref, model=model, batch_size=16, transforms=transforms)Notes
Uncertainty-based drift detection is complementary to feature-based methods. It can detect semantic drift (changes in data meaning) that may not be apparent in raw feature statistics, making it valuable for monitoring model performance in production environments.
The method assumes that model uncertainty is a reliable indicator of data quality. This works best with well-calibrated models trained on representative data. Poorly calibrated models may produce misleading uncertainty estimates.
For optimal performance, ensure the model and transforms match those used during training, and that the reference data represents the expected operational distribution where the model performs reliably.
- predict(x)¶
Predict whether model uncertainty distribution has drifted.
Computes prediction uncertainties for the input data and tests whether their distribution significantly differs from the reference uncertainty distribution using Kolmogorov-Smirnov test.
- Parameters:¶
- Returns:¶
Drift detection results including overall prediction, p-values, test statistics, and feature-level analysis of uncertainty values.
- Return type:¶
Notes
The returned DriftOutput treats uncertainty values as “features” for consistency with the underlying KS test implementation, even though uncertainty-based drift typically involves univariate analysis.
- property x_ref : numpy.typing.NDArray[numpy.float32]¶
Reference data for drift detection.
Lazily encodes the reference dataset on first access. Data is flattened and converted to 32-bit floating point for consistent numerical processing across different input types.
- Returns:¶
Reference data as flattened 32-bit floating point array. Shape is (n_samples, n_features_flattened).
- Return type:¶
NDArray[np.float32]
Notes
Data is cached after first access to avoid repeated encoding overhead.