dataeval.utils.data.datasets.MNIST¶

class dataeval.utils.data.datasets.MNIST(root, download=False, train=True, size=-1, unit_interval=False, dtype=None, channels='channels_first', flatten=False, normalize=None, corruption=None, classes=None, balance=True, randomize=True, slice_back=False, verbose=True)¶

MNIST Dataset and Corruptions.

There are 15 different styles of corruptions. This class downloads differently depending on if you need just the original dataset or if you need corruptions. If you need both a corrupt version and the original version then choose corruption=”identity” as this downloads all of the corrupt datasets and provides the original as identity. If you just need the original, then using corruption=None will download only the original dataset to save time and space.

Parameters:¶

root : str or pathlib.Path¶: Root directory of dataset where the mnist folder exists.
download : bool, default False¶: If True, downloads the dataset from the internet and puts it in root directory. Class checks to see if data is already downloaded to ensure it does not create a duplicate download.
train : bool, default True¶: If True, creates dataset from train_images.npy and train_labels.npy.
size : int, default -1¶: Limit the dataset size, must be a value greater than 0.
unit_interval : bool, default False¶: Shift the data values to the unit interval [0-1].
dtype : type | None, default None¶: Change the NumPy dtype - data is loaded as np.uint8
channels : "channels_first" or "channels_last", default "channels_first"¶: Location of channel axis, default is channels first (N, 1, 28, 28)
flatten : bool, default False¶: Flatten data into single dimension (N, 784) - cannot use both channels and flatten. If True, channels parameter is ignored.
normalize : tuple[mean, std] or None, default None¶: Normalize images acorrding to provided mean and standard deviation
corruption : "identity", "shot_noise", "impulse_noise", "glass_blur", "motion_blur", "shear", "scale", "rotate", "brightness", "translate", "stripe" "fog", "spatter", "dotted_line", "zigzag", "canny_edges" or None, default None¶: The desired corruption style or None.
classes : "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", int, list, or None, default None¶: Option to select specific classes from dataset. Classes are 0-9, any other number is ignored.
balance : bool, default True¶: If True, returns equal number of samples for each class.
randomize : bool, default True¶: If True, shuffles the data prior to selection - uses a set seed for reproducibility.
slice_back : bool, default False¶: If True and size has a value greater than 0, then grabs selection starting at the last image.
verbose : bool, default True¶: If True, outputs print statements.

index2label¶

Dictionary which translates from class integers to the associated class strings.

Type:¶: dict

label2index¶

Dictionary which translates from class strings to the associated class integers.

Type:¶: dict

dataset_dir¶

Location of the folder containing the data. Different from root if downloading data.

Type:¶: Path

metadata¶

Dictionary containing Dataset metadata, such as id which returns the dataset class name.

Type:¶: dict

class_set¶

The chosen set of labels to use. Default is all 10 classes (0-9) but can be down selected using the classes parameter.

Type:¶: set

num_classes¶

The number of classes in class_set.

Type:¶: int

info()¶

Pretty prints dataset name and information.

Return type:¶: str