How to customize the metrics for data cleaning
There are 4 categories of metrics for data cleaning which are available in the ImageStat flag class.
Image Hashing (
ALL_HASHES)XXHASHPCHASH
Image Properties (
ALL_PROPERTIES)WIDTHHEIGHTSIZEASPECT_RATIOCHANNELSDEPTH
Image Visuals (
ALL_VISUALS)BRIGHTNESSBLURRINESSMISSINGZERO
Pixel Statistics (
ALL_PIXELSTATS)MEANSTDVARSKEWKURTOSISENTROPYPERCENTILESHISTOGRAM
To select a custom set of metrics, load in the category:
from dataeval.metrics import ImageStat
Then select the desired metrics and pass them to the desired function or class.
imagestats function example:
# Select the desired data cleaning metrics
flags = ImageStat.SIZE | ImageStat.MEAN
# Compute the stats for the dataset
result = imagestats(dataset, flags=flags)
channelstats function example:
# Select the desired data cleaning metrics
flags = ImageStat.MEAN | ImageStat.STD | ImageStat.ENTROPY
# Compute the stats for the dataset
result = channelstats(dataset, flags=flags)
Linter class example:
# Select the desired data cleaning metrics
flags = ImageStat.ALL_VISUALS
# Set the flags for the class
lints = Linter(dataset, flags=flags)
# Evaluate the dataset
results = lints.evaluate()