dataeval.utils.dataset.datasets.MNIST#

class dataeval.utils.dataset.datasets.MNIST(root, train=True, download=False, size=-1, unit_interval=False, dtype=None, channels=None, flatten=False, normalize=None, corruption=None, classes=None, balance=True, randomize=True, slice_back=False, verbose=True)#

MNIST Dataset and Corruptions.

Parameters:
  • root (str | pathlib.Path) – str | pathlib.Path Root directory of dataset where the mnist_c/ folder exists.

  • train (bool) – bool, default True If True, creates dataset from train_images.npy and train_labels.npy.

  • download (bool) – bool, default False If True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • size (int) – int, default -1 Limit the dataset size, must be a value greater than 0.

  • unit_interval (bool) – bool, default False Shift the data values to the unit interval [0-1].

  • dtype (type | None) – type | None, default None Change the NumPy dtype - data is loaded as np.uint8

  • channels (Literal['channels_first', 'channels_last'] | None) – Literal[‘channels_first’ | ‘channels_last’] | None, default None Location of channel axis if desired, default has no channels (N, 28, 28)

  • flatten (bool) – bool, default False Flatten data into single dimension (N, 784) - cannot use both channels and flatten, channels takes priority over flatten.

  • normalize (tuple[float, float] | None) – tuple[mean, std] | None, default None Normalize images acorrding to provided mean and standard deviation

  • corruption (CorruptionStringMap | None) – Literal[‘identity’ | ‘shot_noise’ | ‘impulse_noise’ | ‘glass_blur’ | ‘motion_blur’ | ‘shear’ | ‘scale’ | ‘rotate’ | ‘brightness’ | ‘translate’ | ‘stripe’ | ‘fog’ | ‘spatter’ | ‘dotted_line’ | ‘zigzag’ | ‘canny_edges’] | None, default None The desired corruption style or None.

  • classes (TClassMap | None) – Literal[“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”] | int | list[int] | list[Literal[“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”]] | None, default None Option to select specific classes from dataset.

  • balance (bool) – bool, default True If True, returns equal number of samples for each class.

  • randomize (bool) – bool, default True If True, shuffles the data prior to selection - uses a set seed for reproducibility.

  • slice_back (bool) – bool, default False If True and size has a value greater than 0, then grabs selection starting at the last image.

  • verbose (bool) – bool, default True If True, outputs print statements.