dataeval.metrics.bias.coverage

dataeval.metrics.bias.coverage(embeddings, radius_type='adaptive', num_observations=20, percent=0.01)

Class for evaluating coverage and identifying images/samples that are in undercovered regions.

Parameters:
embeddings : ArrayLike, shape - (N, P)

Dataset embeddings as unit interval [0, 1]. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.

radius_type : {"adaptive", "naive"}, default "adaptive"

The function used to determine radius.

num_observations : int, default 20

Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.

percent : float, default 0.01

Percent of observations to be considered uncovered. Only applies to adaptive radius.

Returns:

Array of uncovered indices, critical value radii, and the radius for coverage

Return type:

CoverageOutput

Raises:
  • ValueError – If embeddings are not unit interval [0-1]

  • ValueError – If length of embeddings is less than or equal to num_observations

  • ValueError – If radius_type is unknown

Note

Embeddings should be on the unit interval [0-1].

Example

>>> results = coverage(embeddings)
>>> results.uncovered_indices
array([447, 412,   8,  32,  13])
>>> results.coverage_radius
0.1703530195830698

Reference

This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.

[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).