dataeval.utils.thresholds.AdaptiveThreshold¶
-
class dataeval.utils.thresholds.AdaptiveThreshold(multiplier=
3.5, *, lower_multiplier=_UNSET, upper_multiplier=_UNSET, lower_limit=None, upper_limit=None)¶ Threshold using tail-weighted Double-MAD for robust asymmetric outlier detection.
Computes separate dispersion metrics for data below and above the median (Double-MAD), producing naturally asymmetric bounds. On each side the multiplier is automatically scaled up when the tail is heavier than a normal distribution, preventing over-flagging on skewed or heavy-tailed metrics while keeping tight bounds on well-behaved data.
Tail-weight adjustment: for each half, the ratio of the 90th-percentile deviation to the MAD is compared against the expected ratio for normal data (~1.91). When the observed ratio exceeds this, the effective multiplier is increased by
log1p(excess), widening the bound on that side only.Point-mass handling: when both half-MADs are zero (>50% of data at one value), a gap-ratio test determines whether non-mode values form a smooth continuous tail (wider bounds via non-mode MAD) or discrete categorical jumps (tight bounds via global mean absolute deviation).
- Parameters:¶
- multiplier : float or None, default 3.0¶
Symmetric multiplier applied to both bounds. Overridden per-side by lower_multiplier / upper_multiplier when provided.
- lower_multiplier : float or None¶
Override for the lower bound:
median - lower_multiplier * tail_factor * scale_left.Nonemeans no lower bound.- upper_multiplier : float or None¶
Override for the upper bound:
median + upper_multiplier * tail_factor * scale_right.Nonemeans no upper bound.
Examples
>>> symmetric = np.array([1.0, 2.0, 3.0, 4.0, 5.0]) >>> t = AdaptiveThreshold(2.0) >>> lower, upper = t(symmetric)>>> skewed = np.array([1.0, 1.0, 1.0, 2.0, 10.0, 50.0]) >>> lower, upper = t(skewed)- classmethod parse_object(obj)¶
Instantiate a
Thresholdsubclass from a dictionary.The dictionary must contain a
"type"key whose value matches a registeredthreshold_typestring (e.g."constant","standard_deviation","zscore"). The remaining key/value pairs are forwarded as keyword arguments to the matching subclass constructor.