dataeval.shift.DriftDomainClassifier¶
-
class dataeval.shift.DriftDomainClassifier(n_folds=
None, threshold=None, extractor=None, update_strategy=None, config=None)¶ Multivariate Domain Classifier based drift detector.
Detects drift by training a LightGBM classifier to distinguish between reference and test data. If the classifier can discriminate well (high AUROC), the distributions differ and drift is detected.
Uses a fit/predict lifecycle: construct with hyperparameters, call
fit()with reference data, then callpredict()with test data. Usechunked()to create a chunked wrapper for time-series monitoring.Supports two modes:
Non-chunked (default): Computes a single AUROC for the entire test set vs reference. Drift is flagged when AUROC exceeds the threshold (default 0.55).
Chunked (via
chunked()): Splits data into chunks, computes AUROC per chunk, and uses threshold bounds to flag drift per chunk.
- Parameters:¶
- n_folds : int, default 5¶
Number of cross-validation (CV) folds.
- threshold : float or tuple[float, float], default 0.55¶
For non-chunked mode: float threshold where AUROC > threshold means drift. For chunked mode: tuple (lower, upper) bounds on AUROC for identifying drift.
- extractor : FeatureExtractor or None, default None¶
Feature extractor for transforming input data before drift detection. When provided, raw data is passed through the extractor before flattening and comparison. When None, data is used as-is.
- update_strategy : UpdateStrategy or None, default None¶
Strategy for updating reference data when new data arrives. When None, reference data remains fixed throughout detection.
- config : DriftDomainClassifier.Config or None, default None¶
Optional configuration object with default parameters. Parameters specified directly in __init__ will override config defaults.
See also
DriftDomainClassifier.StatsPer-prediction statistics returned in
DriftOutput.details.
Examples
Non-chunked mode:
>>> ref = np.random.randn(200, 4).astype(np.float32) >>> test = np.random.randn(100, 4).astype(np.float32) >>> detector = DriftDomainClassifier().fit(ref) >>> result = detector.predict(test) >>> print(f"Drift: {result.drifted}") Drift: ...Chunked mode:
>>> chunked = DriftDomainClassifier(threshold=(0.45, 0.65)).chunked(chunk_size=100) >>> chunked.fit(ref) ChunkedDrift(DriftDomainClassifier(n_folds=5, threshold=(0.45, 0.65), extractor=None, update_strategy=None), chunker=SizeChunker(chunk_size=100, incomplete='keep'), fitted=True) >>> result = chunked.predict(test)Using configuration:
>>> config = DriftDomainClassifier.Config(n_folds=10, threshold=(0.4, 0.6)) >>> detector = DriftDomainClassifier(config=config)-
chunked(chunker=
None, chunk_size=None, chunk_count=None, threshold=None)¶ Create a chunked wrapper around this drift detector.
Returns a
ChunkedDriftthat splits data into chunks during fit and predict, computing per-chunk metrics and comparing against baseline thresholds.- Parameters:¶
- chunker : BaseChunker or None, default None¶
Explicit chunker instance.
- chunk_size : int or None, default None¶
Create fixed-size chunks of this many samples.
- chunk_count : int or None, default None¶
Split into this many equal chunks.
- threshold : Threshold or None, default None¶
Threshold strategy for determining drift bounds from baseline. When None, uses the detector’s default threshold.
- Returns:¶
A chunked drift wrapper around this detector.
- Return type:¶
ChunkedDrift[TDetails]
- fit(reference_data)¶
Fit the domain classifier on the reference data.
- predict(data)¶
Perform inference on the test data.
- property reference_data : numpy.typing.NDArray[numpy.float32]¶
Reference data, lazily encoded on first access.
Overrides
BaseDrift.reference_datavia MRO when this mixin appears beforeBaseDriftin the inheritance list.
Classes¶
Configuration for DriftDomainClassifier detector. |
|
Statistics from Multivariate Domain Classifier drift detection. |