Outliers#

class dataeval.detectors.linters.Outliers(flags: ImageStat = ImageStat.WIDTH | HEIGHT | SIZE | ASPECT_RATIO | CHANNELS | DEPTH | BRIGHTNESS | BLURRINESS | MISSING | ZERO, outlier_method: Literal['zscore', 'modzscore', 'iqr'] = 'modzscore', outlier_threshold: float | None = None)#

Calculates statistical outliers of a dataset using various statistical tests applied to each image

Parameters:
  • flags (ImageStat, default ImageStat.ALL_PROPERTIES | ImageStat.ALL_VISUALS) – Metric(s) to calculate for each image - calculates all metrics if None Only supports ImageStat.ALL_STATS

  • outlier_method (["modzscore" | "zscore" | "iqr"], optional - default "modzscore") – Statistical method used to identify outliers

  • outlier_threshold (float, optional - default None) – Threshold value for the given outlier_method, above which data is considered an outlier. Uses method specific default if None

stats#

Dictionary to hold the value of each metric for each image

Type:

dict[str, Any]

See also

Duplicates

Notes

There are 3 different statistical methods:

  • zscore

  • modzscore

  • iqr

The z score method is based on the difference between the data point and the mean of the data. The default threshold value for zscore is 3.
Z score = \(|x_i - \mu| / \sigma\)
The modified z score method is based on the difference between the data point and the median of the data. The default threshold value for modzscore is 3.5.
Modified z score = \(0.6745 * |x_i - x̃| / MAD\), where \(MAD\) is the median absolute deviation
The interquartile range method is based on the difference between the data point and the difference between the 75th and 25th qartile. The default threshold value for iqr is 1.5.
Interquartile range = \(threshold * (Q_3 - Q_1)\)

Examples

Initialize the Outliers class:

>>> outliers = Outliers()

Specifying specific metrics to analyze:

>>> outliers = Outliers(flags=ImageStat.SIZE | ImageStat.ALL_VISUALS)

Specifying an outlier method:

>>> outliers = Outliers(outlier_method="iqr")

Specifying an outlier method and threshold:

>>> outliers = Outliers(outlier_method="zscore", outlier_threshold=2.5)
evaluate(data: Iterable[_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]] | StatsOutput | Sequence[StatsOutput]) OutliersOutput#

Returns indices of outliers with the issues identified for each

Parameters:

data (Iterable[ArrayLike], shape - (C, H, W) | StatsOutput | Sequence[StatsOutput]) – A dataset of images in an ArrayLike format or the output(s) from an imagestats metric analysis

Returns:

Output class containing the indices of outliers and a dictionary showing the issues and calculated values for the given index.

Return type:

OutliersOutput

Example

Evaluate the dataset:

>>> outliers.evaluate(images)
OutliersOutput(issues={18: {'brightness': 0.78}, 25: {'brightness': 0.98}})