How to train an autoencoder for embeddings

Problem Statement

For most computer vision tasks like image classification and object detection, the size of the image datasets can put an enormous strain on the speed of dataset analysis methods. A way to lessen this burden is to reduce the size of the images without losing the important information. This is known as dimensionality reduction. Given the high dimensionality of image data, this is best done using an autoencoder trained on a reconstruction task.

To help with this, DataEval has introduced a lightweight, easy-to-use Autoencoder Training class ( AETrainer ), that allows a user to have out-of-the-box functionality for this type of dimensionality reduction.

When to use

The AETrainer class should be used when you have lots of images, have very large images, or your given speed requirements are strict

What you will need

  1. A PyTorch Dataset with your images returned first in __getitem__

  2. (Optional) A PyTorch autoencoder model

  3. (Optional) A PyTorch autoencoder model with a defined encode function

  4. A Python environment with the following packages installed:

    • dataeval or dataeval[all]

If the optional models are not given, a default architecture is used. This default has an encode function. It is encouraged to create a custom architecture that best fits with your data as this will lead to better results during training. We will also provide a sample dataset to facilitate the running of the tutorial.

Setting up

Let’s import the required libraries needed to set up a minimal working example

While you can use your own dataset, for this example we will be importing the MNIST dataset and use it going forward. Let’s import it from the DataEval utils package.

import numpy as np
import torch
from maite_datasets.image_classification import MNIST
from torch.utils.data import Subset

Now you will grab the MNIST dataset and look at it’s size and shape.

# Configure the dataset transforms
transforms = [
    lambda x: x / 255.0,  # scale to [0, 1]
    lambda x: x.astype(np.float32),  # convert to float32
]

training_dataset = MNIST(root="./data/", image_set="train", transforms=transforms, download=True)
testing_dataset = MNIST(root="./data/", image_set="test", transforms=transforms, download=True)
print("Training dataset size:", len(training_dataset))
print("Training image shape:", training_dataset[0][0].shape)
Training dataset size: 60000
Training image shape: (1, 28, 28)

There are over 54,000 images in the training set, each 28x28 pixels. Dimensionality reduction using an encoder will provide speed improvements for downstream tasks.

Note

The MNIST dataset is very small compared to most operational datasets, and for this example does not actually reduce the image size. To use your own dataset, replace training_dataset and testing_dataset in the cells above.

Using a default trainer

Training Phase

DataEval provides a simple default trainer for autoencoder tasks. Let’s import the necessary classes. In this simple example, we will assume you do not have an autoencoder architecture to use.

from dataeval.utils.torch.models import Autoencoder
from dataeval.utils.torch.trainer import AETrainer

Now you set up the model and trainer.

device = "cuda" if torch.cuda.is_available() else "cpu"
model = Autoencoder(channels=1)
trainer = AETrainer(model, device=device, batch_size=32)

Let’s train the model on a subset (6000 images) of the MNIST data. Since this is a simpler problem, you will reduce the default 25 epochs to 10.

training_subset = Subset(training_dataset, range(6000))
training_loss = trainer.train(training_subset, epochs=10)
print(training_loss[-1])
0.11283735911104273

Evaluation Phase

Now that you have a trained model, let’s check its performance on a validation set.

eval_loss = trainer.eval(testing_dataset)
print(eval_loss)
0.11400803080953348

Great! You can see that the model was able to perform reconstruction on unseen data. This is only done to confirm that your model did not overfit to the training data.

Now you can encode the dataset and use those embeddings to speed up downstream tasks.

Encoding Phase

Encoding is different than training or evaluation when using an autoencoder as the latter compresses the image, and then reconstructs it back to the original size. By calling only the first part of the autoencoder, the encoder, you can take advantage of this compression.

Let’s show an example using the training data

embeddings = trainer.encode(training_subset)
print("Embedded image shape:", embeddings.shape)
Embedded image shape: torch.Size([6000, 64, 6, 6])

Now you can see how the encoder can change the overall shape of your images, which can lead to significant benefits for downstream tasks when using large data

Additional Information

Related Notebooks

  1. Bayes Error Rate

  2. Divergence

  3. Sufficiency