dataeval.utils.data.datasets.ShipDataset¶

class dataeval.utils.data.datasets.ShipDataset(root, download=False, size=-1, unit_interval=False, dtype=None, channels='channels_first', flatten=False, normalize=None, balance=False, randomize=True, slice_back=False, verbose=True)¶

A dataset that focuses on identifying ships from satellite images.

The dataset comes from kaggle, Ships in Satellite Imagery. The images come from Planet satellite imagery when they gave open-access to a portion of their data.

There are 4000 80x80x3 (HWC) images of ships, sea, and land. There are also 8 larger scene images similar to what would be operationally provided.

Parameters:¶

root : str or pathlib.Path¶: Root directory of dataset where the ships-in-satellite-imagery folder exists.
download : bool, default False¶: If True, downloads the dataset from the internet and puts it in root directory. Class checks to see if data is already downloaded to ensure it does not create a duplicate download.
size : int, default -1¶: Limit the dataset size, must be a value greater than 0.
unit_interval : bool, default False¶: Shift the data values to the unit interval [0-1].
dtype : type | None, default None¶: If None, data is loaded as np.uint8. Otherwise specify the desired NumPy dtype.
channels : "channels_first" or "channels_last", default channels_first¶: Location of channel axis if desired, default is downloaded image which contains channels last
flatten : bool, default False¶: Flatten data into single dimension (N, 19200) - cannot use both channels and flatten. If True, channels parameter is ignored.
normalize : tuple[mean, std] or None, default None¶: Normalize images acorrding to provided mean and standard deviation
balance : bool, default False¶: If True, limits the data to equal number of samples for each class (1000 samples per class).
randomize : bool, default True¶: If True, shuffles the data prior to selection - uses a set seed for reproducibility.
slice_back : bool, default False¶: If True and size has a value greater than 0, then grabs selection starting at the last image.
verbose : bool, default True¶: If True, outputs print statements.

index2label¶

Dictionary which translates from class integers to the associated class strings.

Type:¶: dict

label2index¶

Dictionary which translates from class strings to the associated class integers.

Type:¶: dict

dataset_dir¶

Location of the folder containing the data. Different from root if downloading data.

Type:¶: Path

metadata¶

Dictionary containing Dataset metadata, such as id which returns the dataset class name.

Type:¶: dict

class_set¶

The chosen set of labels to use. Default is all 10 classes (0-9) but can be down selected using the classes parameter.

Type:¶: set

num_classes¶

The number of classes in class_set.

Type:¶: int

scenes¶

These are extra data samples that are large satellite images encompassing an entire scene. Useful for testing models and pipelines on “real data”.

Type:¶: list[NDArray]

info()¶

Pretty prints dataset name and information.

Return type:¶: str