dataeval.detectors.linters.Outliers =================================== .. py:class:: dataeval.detectors.linters.Outliers(use_dimension = True, use_pixel = True, use_visual = True, outlier_method = 'modzscore', outlier_threshold = None) Calculates statistical Outliers of a dataset using various statistical tests applied to each image :param outlier_method: Statistical method used to identify outliers :type outlier_method: ["modzscore" | "zscore" | "iqr"], optional - default "modzscore" :param outlier_threshold: Threshold value for the given ``outlier_method``, above which data is considered an outlier. Uses method specific default if `None` :type outlier_threshold: float, optional - default None .. attribute:: stats Various stats output classes that hold the value of each metric for each image :type: tuple[DimensionStatsOutput, PixelStatsOutput, VisualStatsOutput] .. seealso:: :term:`Duplicates` .. note:: There are 3 different statistical methods: - zscore - modzscore - iqr | The z score method is based on the difference between the data point and the mean of the data. The default threshold value for `zscore` is 3. | Z score = :math:`|x_i - \mu| / \sigma` | The modified z score method is based on the difference between the data point and the median of the data. The default threshold value for `modzscore` is 3.5. | Modified z score = :math:`0.6745 * |x_i - x̃| / MAD`, where :math:`MAD` is the median absolute deviation | The interquartile range method is based on the difference between the data point and the difference between the 75th and 25th qartile. The default threshold value for `iqr` is 1.5. | Interquartile range = :math:`threshold * (Q_3 - Q_1)` .. rubric:: Examples Initialize the Outliers class: >>> outliers = Outliers() Specifying an outlier method: >>> outliers = Outliers(outlier_method="iqr") Specifying an outlier method and threshold: >>> outliers = Outliers(outlier_method="zscore", outlier_threshold=3.5) .. py:method:: evaluate(data) Returns indices of Outliers with the issues identified for each :param data: A dataset of images in an ArrayLike format :type data: Iterable[ArrayLike], shape - (C, H, W) :returns: Output class containing the indices of outliers and a dictionary showing the issues and calculated values for the given index. :rtype: OutliersOutput .. rubric:: Example Evaluate the dataset: >>> outliers = Outliers(outlier_method="zscore", outlier_threshold=3.5) >>> results = outliers.evaluate(outlier_images) >>> list(results.issues) [10, 12] >>> results.issues[10] {'skew': -3.906, 'kurtosis': 13.266, 'entropy': 0.2128, 'contrast': 1.25, 'zeros': 0.05493} .. py:method:: from_stats(stats: OutlierStatsOutput | dataeval.metrics.stats.datasetstats.DatasetStatsOutput) -> OutliersOutput[IndexIssueMap] from_stats(stats: Sequence[OutlierStatsOutput]) -> OutliersOutput[list[IndexIssueMap]] Returns indices of Outliers with the issues identified for each :param stats: The output(s) from a dimensionstats, pixelstats, or visualstats metric analysis or an aggregate DatasetStatsOutput :type stats: OutlierStatsOutput | DatasetStatsOutput | Sequence[OutlierStatsOutput] :returns: Output class containing the indices of outliers and a dictionary showing the issues and calculated values for the given index. :rtype: OutliersOutput .. seealso:: :obj:`dimensionstats`, :obj:`pixelstats`, :obj:`visualstats` .. rubric:: Example Evaluate the dataset: >>> outliers = Outliers(outlier_method="zscore", outlier_threshold=3.5) >>> results = outliers.from_stats([stats1, stats2]) >>> len(results) 2 >>> results.issues[0] {10: {'skew': -3.906, 'kurtosis': 13.266, 'entropy': 0.2128}, 12: {'std': 0.00536, 'var': 2.87e-05, 'skew': -3.906, 'kurtosis': 13.266, 'entropy': 0.2128}} >>> results.issues[1] {}