dataeval.utils.dataset.datasets.MNIST#
- class dataeval.utils.dataset.datasets.MNIST(root, train=True, download=False, size=-1, unit_interval=False, dtype=None, channels=None, flatten=False, normalize=None, corruption=None, classes=None, balance=True, randomize=True, slice_back=False, verbose=True)#
MNIST Dataset and Corruptions.
- Parameters:
root (str | pathlib.Path) – str |
pathlib.PathRoot directory of dataset where themnist_c/folder exists.train (bool) – bool, default True If True, creates dataset from
train_images.npyandtrain_labels.npy.download (bool) – bool, default False If True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
size (int) – int, default -1 Limit the dataset size, must be a value greater than 0.
unit_interval (bool) – bool, default False Shift the data values to the unit interval [0-1].
dtype (type | None) – type | None, default None Change the NumPy dtype - data is loaded as np.uint8
channels (Literal['channels_first', 'channels_last'] | None) – Literal[‘channels_first’ | ‘channels_last’] | None, default None Location of channel axis if desired, default has no channels (N, 28, 28)
flatten (bool) – bool, default False Flatten data into single dimension (N, 784) - cannot use both channels and flatten, channels takes priority over flatten.
normalize (tuple[float, float] | None) – tuple[mean, std] | None, default None Normalize images acorrding to provided mean and standard deviation
corruption (CorruptionStringMap | None) – Literal[‘identity’ | ‘shot_noise’ | ‘impulse_noise’ | ‘glass_blur’ | ‘motion_blur’ | ‘shear’ | ‘scale’ | ‘rotate’ | ‘brightness’ | ‘translate’ | ‘stripe’ | ‘fog’ | ‘spatter’ | ‘dotted_line’ | ‘zigzag’ | ‘canny_edges’] | None, default None The desired corruption style or None.
classes (TClassMap | None) – Literal[“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”] | int | list[int] | list[Literal[“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”]] | None, default None Option to select specific classes from dataset.
balance (bool) – bool, default True If True, returns equal number of samples for each class.
randomize (bool) – bool, default True If True, shuffles the data prior to selection - uses a set seed for reproducibility.
slice_back (bool) – bool, default False If True and size has a value greater than 0, then grabs selection starting at the last image.
verbose (bool) – bool, default True If True, outputs print statements.