Gaussian RBF

Drift refers to the phenomenon where the statistical properties of the data change over time. It occurs when the underlying distribution of the input features or the target variable (what the model is trying to predict) shifts, leading to a discrepancy between the training data and the real-world data the model encounters during deployment.

Through concepts examined in the NeurIPS 2019 paper Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift, we can utilize various methods in order to determine if drift is detected. For high-dimensional data, we typically want to reduce the dimensionality before performing tests against the dataset. To do so, we incorporate Untrained AutoEncoders (UAE) and Black-Box Shift Estimation (BBSE) predictors using the classifier’s softmax outputs as out-of-the box preprocessing methods and note that Principal Component Analysis can also be easily implemented using scikit-learn. Preprocessing methods which do not rely on the classifier will usually pick up drift in the input data, while BBSE focuses on label shift.

How-To Guides

Check out this how to to begin using the Drift Detection class

Drift Detection Tutorial

DataEval API

GaussianRBF

The GaussianRBF class implements a Gaussian kernel, also known as a radial basis function (RBF) kernel. It is used to construct a covariance matrix for gaussian processes and is the default kernel used in the MMD drift detection test.

class dataeval.detectors.GaussianRBF(sigma: Tensor | None = None, init_sigma_fn: Callable | None = None, trainable: bool = False)

Gaussian RBF kernel: k(x,y) = exp(-(1/(2*sigma^2)||x-y||^2). A forward pass takes a batch of instances x [Nx, features] and y [Ny, features] and returns the kernel matrix [Nx, Ny].

Parameters:

sigma (Optional[torch.Tensor], default None) – Bandwidth used for the kernel. Needn’t be specified if being inferred or trained. Can pass multiple values to eval kernel with and then average.
init_sigma_fn (Optional[Callable], default None) – Function used to compute the bandwidth sigma. Used when sigma is to be inferred. The function’s signature should take in the tensors x, y and dist and return sigma. If None, it is set to sigma_median().
trainable (bool, default False) – Whether or not to track gradients w.r.t. sigma to allow it to be trained.

forward(x: ndarray | Tensor, y: ndarray | Tensor, infer_sigma: bool = False) → Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.