How to customize the metrics for data cleaning
There are 4 categories of metrics for data cleaning:
ImageHash
XXHASH
PCHASH
ImageProperties
WIDTH
HEIGHT
SIZE
ASPECT_RATIO
CHANNELS
DEPTH
ImageStatistics
MEAN
STD
VAR
SKEW
KURTOSIS
ENTROPY
PERCENTILES
HISTOGRAM
ImageVisuals
BRIGHTNESS
BLURRINESS
MISSING
ZERO
To select a custom set of metrics, load in the category:
from dataeval.flags import ImageHash, ImageProperties, ImageStatistics, ImageVisuals
Then select the desired metrics and pass them to the desired class.
ImageStats class example:
# Select the desired data cleaning metrics
flags = [ImageProperties.SIZE, ImageStatistics.MEAN]
# Set the flags for the class
stats = ImageStats(flags=flags)
# Add the dataset
stats.update(dataset)
# Compute the stats
result = stats.compute()
ChannelStats class example:
# Select the desired data cleaning metrics
flags = [ImageStatistics.MEAN, ImageStatistics.STD, ImageStatistics.ENTROPY]
# Set the flags for the class
stats = ChannelStats(flags=flags)
# Add the dataset
stats.update(dataset)
# Compute the stats
result = stats.compute()
Linter class example:
# Select the desired data cleaning metrics
flags = [ImageVisuals.BRIGHTNESS]
# Set the flags for the class
lints = Linter(dataset, flags=flags)
# Evaluate the dataset
results = lints.evaluate()