coverage#

dataeval.metrics.bias.coverage(embeddings: ArrayLike, radius_type: Literal['adaptive', 'naive'] = 'adaptive', k: int = 20, percent: float64 = 0.01) CoverageOutput#

Class for evaluating coverage and identifying images/samples that are in undercovered regions.

Parameters:
  • embeddings (ArrayLike, shape - (N, P)) – A dataset in an ArrayLike format. Function expects the data to have 2 dimensions, N number of observations in a P-dimesionial space.

  • radius_type (Literal["adaptive", "naive"], default "adaptive") – The function used to determine radius.

  • k (int, default 20) – Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.

  • percent (np.float64, default np.float(0.01)) – Percent of observations to be considered uncovered. Only applies to adaptive radius.

Returns:

Array of uncovered indices, critical value radii, and the radius for coverage

Return type:

CoverageOutput

Raises:
  • ValueError – If length of embeddings is less than or equal to k

  • ValueError – If radius_type is unknown

Note

Embeddings should be on the unit interval [0-1].

Example

>>> results = coverage(embeddings)
>>> results.indices
array([447, 412,   8,  32,  63])
>>> results.critical_value
0.8459038956941765

Reference#

This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.

[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).