dataeval.utils.dataset.datasets.MNIST

class dataeval.utils.dataset.datasets.MNIST(root, train=True, download=False, size=-1, unit_interval=False, dtype=None, channels=None, flatten=False, normalize=None, corruption=None, classes=None, balance=True, randomize=True, slice_back=False, verbose=True)

MNIST Dataset and Corruptions.

Parameters:
root : str | pathlib.Path

str | pathlib.Path Root directory of dataset where the mnist_c/ folder exists.

train : bool

bool, default True If True, creates dataset from train_images.npy and train_labels.npy.

download : bool

bool, default False If True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

size : int

int, default -1 Limit the dataset size, must be a value greater than 0.

unit_interval : bool

bool, default False Shift the data values to the unit interval [0-1].

dtype : type | None

type | None, default None Change the NumPy dtype - data is loaded as np.uint8

channels : Literal['channels_first', 'channels_last'] | None

Literal[‘channels_first’ | ‘channels_last’] | None, default None Location of channel axis if desired, default has no channels (N, 28, 28)

flatten : bool

bool, default False Flatten data into single dimension (N, 784) - cannot use both channels and flatten, channels takes priority over flatten.

normalize : tuple[float, float] | None

tuple[mean, std] | None, default None Normalize images acorrding to provided mean and standard deviation

corruption : CorruptionStringMap | None

Literal[‘identity’ | ‘shot_noise’ | ‘impulse_noise’ | ‘glass_blur’ | ‘motion_blur’ | ‘shear’ | ‘scale’ | ‘rotate’ | ‘brightness’ | ‘translate’ | ‘stripe’ | ‘fog’ | ‘spatter’ | ‘dotted_line’ | ‘zigzag’ | ‘canny_edges’] | None, default None The desired corruption style or None.

classes : TClassMap | None

Literal[“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”] | int | list[int] | list[Literal[“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”]] | None, default None Option to select specific classes from dataset.

balance : bool

bool, default True If True, returns equal number of samples for each class.

randomize : bool

bool, default True If True, shuffles the data prior to selection - uses a set seed for reproducibility.

slice_back : bool

bool, default False If True and size has a value greater than 0, then grabs selection starting at the last image.

verbose : bool

bool, default True If True, outputs print statements.