dataeval.metrics.bias.coverage¶

dataeval.metrics.bias.coverage(embeddings, radius_type='adaptive', num_observations=20, percent=0.01)¶

Class for evaluating coverage and identifying images/samples that are in undercovered regions.

Parameters:¶

embeddings : ArrayLike, shape - (N, P)¶: Dataset embeddings as unit interval [0, 1]. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.
radius_type : {"adaptive", "naive"}, default "adaptive"¶: The function used to determine radius.
num_observations : int, default 20¶: Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.
percent : float, default 0.01¶: Percent of observations to be considered uncovered. Only applies to adaptive radius.

Returns:¶

Array of uncovered indices, critical value radii, and the radius for coverage

Return type:¶

CoverageOutput

Raises:¶

ValueError – If embeddings are not unit interval [0-1]
ValueError – If length of embeddings is less than or equal to num_observations
ValueError – If radius_type is unknown

Note

Embeddings should be on the unit interval [0-1].

Example

>>> results = coverage(embeddings)
>>> results.uncovered_indices
array([447, 412,   8,  32,  13])
>>> results.coverage_radius
0.1703530195830698

Reference¶

[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).