Maximum Mean Discrepancy¶

Maximum Mean Discrepancy (MMD) Drift Detection is a kernel-based method for comparing two distributions by calculating the distance between their mean embeddings in a reproducing kernel Hilbert space (RKHS). The MMD test statistic is defined as:

\[ \textrm{MMD}(F, p, q) = || \mu_{p} - \mu_{q} ||^2_{F} \]

where $\mu_{p}$ and $\mu_{q}$ are the mean embeddings of distributions p and q in the RKHS. The MMD test is particularly useful for detecting complex, multivariate distributional differences. Unbiased estimates of $\textrm{MMD}^2$ can be obtained using the kernel trick, and a permutation test is used to obtain the p-value.

A common choice for the kernel is the radial basis function (RBF) kernel, though other kernels can be used depending on the application.

Key characteristics:

Kernel trick: Projects data into high-dimensional feature space using kernel trick
Multivariate: Naturally handles multiple features and their dependencies
Universal: With universal kernels (e.g., RBF), can detect any distributional difference
Non-parametric: No assumptions about distribution shapes
Interpretability: Lower than univariate tests; doesn’t identify which features drifted

Common kernels:

Radial Basis Function (RBF) / Gaussian kernel: $$ k(x, y) = \exp\left(-\frac{\|x-y\|^2}{2\sigma^2}\right) $$
- Most common choice; universal kernel
- Bandwidth $\sigma$ controls sensitivity to local vs. global differences
Polynomial kernel: $$ k(x, y) = (x^T y + c)^d $$
- Captures polynomial interactions up to degree $d$

Statistical testing:

A permutation test is used to obtain the p-value:

Pool reference and test samples
Randomly permute and split into two groups multiple times
Compute MMD for each permutation
P-value = proportion of permutations with MMD ≥ observed MMD

When to use:

Image/video embeddings (ResNet, CLIP, ViT, etc.) - primary use case
High-dimensional data where feature interactions matter
When drift involves changes in correlations between features
Deep learning computer vision applications
Cross-domain shifts (e.g., synthetic → real, indoor → outdoor)
When univariate tests fail to detect known drift

Limitations:

Computationally expensive for large datasets (quadratic in sample size)
Kernel selection and hyperparameter tuning required
Limited interpretability (doesn’t indicate which features drifted)
Requires sufficient samples for reliable permutation testing