dataeval.shift.DriftDomainClassifier

class dataeval.shift.DriftDomainClassifier(n_folds=None, threshold=None, extractor=None, update_strategy=None, config=None)

Multivariate Domain Classifier based drift detector.

Detects drift by training a LightGBM classifier to distinguish between reference and test data. If the classifier can discriminate well (high AUROC), the distributions differ and drift is detected.

Uses a fit/predict lifecycle: construct with hyperparameters, call fit() with reference data, then call predict() with test data. Use chunked() to create a chunked wrapper for time-series monitoring.

Supports two modes:

  • Non-chunked (default): Computes a single AUROC for the entire test set vs reference. Drift is flagged when AUROC exceeds the threshold (default 0.55).

  • Chunked (via chunked()): Splits data into chunks, computes AUROC per chunk, and uses threshold bounds to flag drift per chunk.

Parameters:
n_folds : int, default 5

Number of cross-validation (CV) folds.

threshold : float or tuple[float, float], default 0.55

For non-chunked mode: float threshold where AUROC > threshold means drift. For chunked mode: tuple (lower, upper) bounds on AUROC for identifying drift.

extractor : FeatureExtractor or None, default None

Feature extractor for transforming input data before drift detection. When provided, raw data is passed through the extractor before flattening and comparison. When None, data is used as-is.

update_strategy : UpdateStrategy or None, default None

Strategy for updating reference data when new data arrives. When None, reference data remains fixed throughout detection.

config : DriftDomainClassifier.Config or None, default None

Optional configuration object with default parameters. Parameters specified directly in __init__ will override config defaults.

See also

DriftDomainClassifier.Stats

Per-prediction statistics returned in DriftOutput.details.

Examples

Non-chunked mode:

>>> ref = np.random.randn(200, 4).astype(np.float32)
>>> test = np.random.randn(100, 4).astype(np.float32)
>>> detector = DriftDomainClassifier().fit(ref)
>>> result = detector.predict(test)
>>> print(f"Drift: {result.drifted}")
Drift: ...

Chunked mode:

>>> chunked = DriftDomainClassifier(threshold=(0.45, 0.65)).chunked(chunk_size=100)
>>> chunked.fit(ref)
ChunkedDrift(DriftDomainClassifier(n_folds=5, threshold=(0.45, 0.65), extractor=None, update_strategy=None), chunker=SizeChunker(chunk_size=100, incomplete='keep'), fitted=True)
>>> result = chunked.predict(test)

Using configuration:

>>> config = DriftDomainClassifier.Config(n_folds=10, threshold=(0.4, 0.6))
>>> detector = DriftDomainClassifier(config=config)
chunked(chunker=None, chunk_size=None, chunk_count=None, threshold=None)

Create a chunked wrapper around this drift detector.

Returns a ChunkedDrift that splits data into chunks during fit and predict, computing per-chunk metrics and comparing against baseline thresholds.

Parameters:
chunker : BaseChunker or None, default None

Explicit chunker instance.

chunk_size : int or None, default None

Create fixed-size chunks of this many samples.

chunk_count : int or None, default None

Split into this many equal chunks.

threshold : Threshold or None, default None

Threshold strategy for determining drift bounds from baseline. When None, uses the detector’s default threshold.

Returns:

A chunked drift wrapper around this detector.

Return type:

ChunkedDrift[TDetails]

fit(reference_data)

Fit the domain classifier on the reference data.

Parameters:
reference_data : Any

Reference data. When an extractor is configured, this can be any data type accepted by the extractor. Otherwise, must be array-like with shape (n_samples, n_features).

Return type:

Self

predict(data)

Perform inference on the test data.

Parameters:
data : Any

Test data. When an extractor is configured, this can be any data type accepted by the extractor. Otherwise, must be array-like.

Returns:

Drift prediction with AUROC statistics.

Return type:

DriftOutput[DriftDomainClassifier.Stats]

property reference_data : numpy.typing.NDArray[numpy.float32]

Reference data, lazily encoded on first access.

Overrides BaseDrift.reference_data via MRO when this mixin appears before BaseDrift in the inheritance list.

Classes

Config

Configuration for DriftDomainClassifier detector.

Stats

Statistics from Multivariate Domain Classifier drift detection.