dataeval.utils.data.datasets.Ships¶
-
class dataeval.utils.data.datasets.Ships(root, download=
False, size=-1, unit_interval=False, dtype=None, channels='channels_first', flatten=False, crop=None, normalize=None, balance=False, slice_back=False, verbose=False)¶ A dataset that focuses on identifying ships from satellite images.
The dataset comes from kaggle, Ships in Satellite Imagery. The images come from Planet satellite imagery when they gave open-access to a portion of their data.
There are 4000 80x80x3 (HWC) images of ships, sea, and land. There are also 8 larger scene images similar to what would be operationally provided.
- Parameters:¶
- root : str or pathlib.Path¶
Root directory of dataset where the
shipdatasetfolder exists.- download : bool, default False¶
If True, downloads the dataset from the internet and puts it in root directory. Class checks to see if data is already downloaded to ensure it does not create a duplicate download.
- size : int, default -1¶
Limit the dataset size, must be a value greater than 0.
- unit_interval : bool, default False¶
Shift the data values to the unit interval [0-1].
- dtype : type | None, default None¶
If None, data is loaded as np.uint8. Otherwise specify the desired NumPy dtype.
- channels : "channels_first" or "channels_last", default channels_first¶
Location of channel axis if desired, default is downloaded image which contains channels last
- flatten : bool, default False¶
Flatten data into single dimension (N, 19200) - cannot use both channels and flatten. If True, channels parameter is ignored.
- normalize : tuple[mean, std] or None, default None¶
Normalize images acorrding to provided mean and standard deviation
- balance : bool, default False¶
If True, limits the data to equal number of samples for each class (1000 samples per class).
- slice_back : bool, default False¶
If True and size has a value greater than 0, then grabs selection starting at the last image.
- verbose : bool, default False¶
If True, outputs print statements.
- crop : int | None¶
- index2label¶
Dictionary which translates from class integers to the associated class strings.
- Type:¶
dict
- label2index¶
Dictionary which translates from class strings to the associated class integers.
- Type:¶
dict
- dataset_dir¶
Location of the folder containing the data. Different from root if downloading data.
- Type:¶
Path
- metadata¶
Dictionary containing Dataset metadata, such as id which returns the dataset class name.
- Type:¶
dict
- class_set¶
The chosen set of labels to use. This is a binary dataset so there is only 0 (“no ship”) and 1 (“ship”).
- Type:¶
set
- scenes¶
Path to extra data samples that are large satellite images encompassing an entire scene. Useful for testing models and pipelines on “real data”.
- Type:¶
list[str]
- get_scene(index)¶
Get the desired satellite image (scene) by passing in the index of the desired file.
- Parameters:¶
- index : int¶
Value of the desired data point
- Returns:¶
Scene image
- Return type:¶
NDArray[np.uintp]
Note
The scene will be returned with the channel axis in the position specified by the class channels parameter (default is channels first).