dataeval.shift.DriftDomainClassifier¶
-
class dataeval.shift.DriftDomainClassifier(n_folds=
None, threshold=None, config=None)¶ Multivariate Domain Classifier based drift detector.
Detects drift by training a LightGBM classifier to distinguish between reference and test data. If the classifier can discriminate well (high AUROC), the distributions differ and drift is detected.
Uses a fit/predict lifecycle: construct with hyperparameters, call
fit()with reference data, then callpredict()with test data.Supports two modes:
Non-chunked (default): Computes a single AUROC for the entire test set vs reference. Drift is flagged when AUROC exceeds the threshold (default 0.55).
Chunked: Splits data into chunks, computes AUROC per chunk, and uses threshold bounds to flag drift per chunk. Enable by passing chunking parameters to
fit().
- Parameters:¶
- n_folds : int, default 5¶
Number of cross-validation (CV) folds.
- threshold : float or tuple[float, float], default 0.55¶
For non-chunked mode: float threshold where AUROC > threshold means drift. For chunked mode: tuple (lower, upper) bounds on AUROC for identifying drift.
- config : DriftDomainClassifier.Config or None, default None¶
Optional configuration object with default parameters. Parameters specified directly in __init__ will override config defaults.
Examples
Non-chunked mode:
>>> ref = np.random.randn(200, 4).astype(np.float32) >>> test = np.random.randn(100, 4).astype(np.float32) >>> detector = DriftDomainClassifier().fit(ref) >>> result = detector.predict(test) >>> print(f"Drift: {result.drifted}") Drift: ...Chunked mode:
>>> detector = DriftDomainClassifier(threshold=(0.45, 0.65)).fit(ref, chunk_size=100) >>> result = detector.predict(test)Using configuration:
>>> config = DriftDomainClassifier.Config(n_folds=10, threshold=(0.4, 0.6)) >>> detector = DriftDomainClassifier(config=config)-
fit(x_ref, chunker=
None, chunk_size=None, chunk_count=None, chunks=None, chunk_indices=None)¶ Fit the domain classifier on the reference data.
When chunking is enabled, the detector computes per-chunk baseline AUROC values from the reference data and derives threshold bounds. During prediction, the test data is split into chunks of the same size used here, so that per-chunk statistics are comparable to the baseline.
If
chunk_countis provided, the effective chunk size is computed aslen(x_ref) // chunk_countand locked in for prediction. Usechunk_sizedirectly when you want explicit control over the chunk size used for both fitting and prediction.- Parameters:¶
- x_ref : ArrayLike¶
Reference data with dim[n_samples, n_features].
- chunker : ArrayChunker or None, default None¶
Explicit chunker instance for chunked mode.
- chunk_size : int or None, default None¶
Create fixed-size chunks. The same size is used during prediction to keep statistics comparable.
- chunk_count : int or None, default None¶
Split into this many equal chunks. Converted to a fixed
chunk_sizebased on the reference data length.- chunks : list[ArrayLike] or None, default None¶
Pre-split reference data for chunked mode.
- chunk_indices : list[list[int]] or None, default None¶
Index groupings for chunking reference data.
- Return type:¶
Self
-
predict(x=
None, chunks=None, chunk_indices=None)¶ Perform inference on the test data.
- Parameters:¶
- x : ArrayLike or None¶
Test (analysis) data with dim[n_samples, n_features]. Required for non-chunked mode and chunked mode unless pre-built chunks are provided.
- chunks : list[ArrayLike] or None, default None¶
Pre-built test data chunks.
- chunk_indices : list[list[int]] or None, default None¶
Index groupings for chunking test data.
- Returns:¶
Non-chunked mode:
detailsis aDriftDomainClassifierStatsTypedDict. Chunked mode:detailsis apolars.DataFramewith per-chunk results.- Return type:¶
- property is_chunked : bool¶
Whether the detector is operating in chunked mode.
Classes¶
Configuration for DriftDomainClassifier detector. |