Out of Distribution Data¶
What is it¶
OOD_AE is a method for detecting
out of distribution images via autoencoder
reconstruction error. Images which are poorly reconstructed by an
autoencoder, trained on the reference dataset, are likely
to be qualitatively different from those on which the model was trained. The
functionality is derived from the Tensorflow implementation in
Alibi Detect.
When to use it¶
The OOD_AE class and similar should be used when you would like to find
individual images in a dataset which are qualitatively different from those in
a reference (training) dataset. Typically, the main use-case is when you have a
new set of (operational) images, and would like to determine if there are any
qualitatively different images amongst them. These could be a novel,
operationally relevant class or sub-class which was not present in the training
data. This type of detection is critical because models are likely to degrade
rapidly if novel images represent a significant portion of operational data.
Theory behind it¶
An autoencoder is a neural network which takes input data, compresses it down to a smaller dimensional space, and then attempts to reconstruct the original input data from the compressed data.
(https://www.compthree.com/blog/autoencoder/)
If a trained autoencoder encounters an image which falls outside the data manifold on which it is trained, it will generally do a poor job of reconstructing it. By default, we take the top percentile of reconstruction error from the training dataset, and set that as the threshold for considering an image as OOD.
Following OOD detection, a user can then investigate the individual images which were detected as OOD.