dataeval.detectors.linters.Outliers¶
-
class dataeval.detectors.linters.Outliers(use_dimension=
True, use_pixel=True, use_visual=True, outlier_method='modzscore', outlier_threshold=None)¶ Calculates statistical outliers of a dataset using various statistical tests applied to each image.
- Parameters:¶
- use_dimension : bool, default True¶
If True, use dimension statistics to identify outliers
- use_pixel : bool, default True¶
If True, use pixel statistics to identify outliers
- use_visual : bool, default True¶
If True, use visual statistics to identify outliers
- outlier_method : ["modzscore" | "zscore" | "iqr"], optional - default "modzscore"¶
Statistical method used to identify outliers
- outlier_threshold : float, optional - default None¶
Threshold value for the given
outlier_method, above which data is considered an outlier - uses method specific default if None
- stats¶
Various stats output classes that hold the value of each metric for each image
- Type:¶
tuple[DimensionStatsOutput, PixelStatsOutput, VisualStatsOutput]
See also
Note
There are 3 different statistical methods:
zscore
modzscore
iqr
The z score method is based on the difference between the data point and the mean of the data. The default threshold value for zscore is 3.Z score = \(|x_i - \mu| / \sigma\)The modified z score method is based on the difference between the data point and the median of the data. The default threshold value for modzscore is 3.5.Modified z score = \(0.6745 * |x_i - x̃| / MAD\), where \(MAD\) is the median absolute deviationThe interquartile range method is based on the difference between the data point and the difference between the 75th and 25th qartile. The default threshold value for iqr is 1.5.Interquartile range = \(threshold * (Q_3 - Q_1)\)Examples
Initialize the Outliers class:
>>> outliers = Outliers()Specifying an outlier method:
>>> outliers = Outliers(outlier_method="iqr")Specifying an outlier method and threshold:
>>> outliers = Outliers(outlier_method="zscore", outlier_threshold=3.5)- evaluate(data)¶
Returns indices of Outliers with the issues identified for each
- Parameters:¶
- Returns:¶
Output class containing the indices of outliers and a dictionary showing the issues and calculated values for the given index.
- Return type:¶
Example
Evaluate the dataset:
>>> outliers = Outliers(outlier_method="zscore", outlier_threshold=3.5) >>> results = outliers.evaluate(outlier_images) >>> list(results.issues) [10, 12] >>> results.issues[10] {'contrast': 1.2499999999203126, 'entropy': 0.21278774841317422, 'zeros': 0.054931640625}
- from_stats(stats: OutlierStatsOutput) dataeval.outputs.OutliersOutput[dataeval.outputs._linters.IndexIssueMap]¶
- from_stats(stats: collections.abc.Sequence[OutlierStatsOutput]) dataeval.outputs.OutliersOutput[list[dataeval.outputs._linters.IndexIssueMap]]
Returns indices of Outliers with the issues identified for each.
- Parameters:¶
- stats : OutlierStatsOutput | ImageStatsOutput | Sequence[OutlierStatsOutput]¶
The output(s) from a dimensionstats, pixelstats, or visualstats metric analysis or an aggregate ImageStatsOutput
- Returns:¶
Output class containing the indices of outliers and a dictionary showing the issues and calculated values for the given index.
- Return type:¶
See also
dimensionstats,pixelstats,visualstatsExample
Evaluate the dataset:
>>> outliers = Outliers(outlier_method="zscore", outlier_threshold=3.5) >>> results = outliers.from_stats([stats1, stats2]) >>> len(results) 2 >>> results.issues[0] {10: {'entropy': 0.2128, 'zeros': 0.05493}, 12: {'entropy': 0.2128, 'std': 0.00536, 'var': 2.87e-05, 'zeros': 0.05493}} >>> results.issues[1] {}