coverage#
- dataeval.metrics.bias.coverage(embeddings: ArrayLike, radius_type: Literal['adaptive', 'naive'] = 'adaptive', k: int = 20, percent: float64 = 0.01) CoverageOutput#
Class for evaluating coverage and identifying images/samples that are in undercovered regions.
- Parameters:
embeddings (ArrayLike, shape - (N, P)) – A dataset in an ArrayLike format. Function expects the data to have 2 dimensions, N number of observations in a P-dimesionial space.
radius_type (Literal["adaptive", "naive"], default "adaptive") – The function used to determine radius.
k (int, default 20) – Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.
percent (np.float64, default np.float(0.01)) – Percent of observations to be considered uncovered. Only applies to adaptive radius.
- Returns:
Array of uncovered indices, critical value radii, and the radius for coverage
- Return type:
- Raises:
ValueError – If length of embeddings is less than or equal to k
ValueError – If radius_type is unknown
Note
Embeddings should be on the unit interval [0-1].
Example
>>> results = coverage(embeddings) >>> results.indices array([447, 412, 8, 32, 63]) >>> results.critical_value 0.8459038956941765
Reference#
This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.
[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).