Image Statistics

The basic ImageStats class assists with understanding the dataset. This class can be used in conjunction with the Linter class to determine if there are any issues with any of the images in the dataset.

This class can be used to get a big picture view of the dataset and it’s underlying distribution.

The stats delivered by the class is broken down into 3 main categories:

  • statistics covering image properties,

  • statistics covering the visual aspect of images,

  • and normal statistics about pixel values.

Below shows the statistics each category calculates.

  • Image Properties

  • height

  • width

  • size

  • aspect ratio

  • number of channels

  • pixel value range

  • Image Visuals

  • image brightness

  • image blurriness

  • missing values (NaNs)

  • number of 0 value pixels

  • Pixel Statistics

  • mean pixel value

  • pixel value standard deviation

  • pixel value variance

  • pixel value skew

  • pixel value kurtosis

  • entropy of the image

  • pixel percentiles (min, max, 25th, 50th, and 75th percentile values)

  • histogram of pixel values

In addition to the above stats, the ImageStats class also defines a hash for each image to be used in conjunction with the Duplicates class in order to identify duplicate images.

Tutorials

To see how the ImageStats class can be used while doing exploratory data analysis, check out the EDA Part 1 tutorial.

Exploratory Data Analysis Part 1

How To Guides

There is a how-to guide that applies to the ImageStats class.

DataEval API

class dataeval.metrics.ImageStats(flags: ImageHash | ImageProperty | ImageVisuals | ImageStatistics | Sequence[ImageHash | ImageProperty | ImageVisuals | ImageStatistics] | None = None)

Calculates various image property statistics

Parameters:

flags ([ImageHash | ImageProperty | ImageStatistics | ImageVisuals], default None) – Metric(s) to calculate for each image per channel - calculates all metrics if None

compute() Dict[str, Any]

Computes the specified measures on the cached values

Returns:

Dictionary results of the specified measures

Return type:

Dict[str, Any]

evaluate(images: TBatch) Dict[str, Any]

Calculate metric results given a single batch of images

reset()

Resets the internal metric cache

update(images: Iterable[ArrayLike]) None

Updates internal metric cache for later calculation

Parameters:

batch (Sequence) – Sequence of images to be processed