dataeval.core.coverage_adaptive

dataeval.core.coverage_adaptive(embeddings, num_observations, percent)

Evaluate coverage using an adaptive radius calculation method.

This method calculates a data-adaptive coverage radius based on the distribution of critical value radii, selecting the top percentage of observations as uncovered.

Parameters:
embeddings : Array

Dataset embeddings as unit interval [0, 1]. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.

num_observations : int

Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.

percent : float

Percent of observations to be considered uncovered. Should be between 0 and 1.

Returns:

A tuple containing: - uncovered_indices : Array of indices for uncovered observations - critical_value_radii : Array of critical value radii for each observation - coverage_radius : The adaptive radius threshold for coverage

Return type:

tuple[NDArray[np.intp], NDArray[np.float64], float]

Raises:
  • ValueError – If embeddings are not unit interval [0-1]

  • ValueError – If length of embeddings is less than or equal to num_observations

Notes

Embeddings should be on the unit interval [0-1].

The adaptive method determines the coverage radius based on the data distribution, selecting the top percent of observations with the largest critical value radii as uncovered. This approach is more flexible than the naive method and adapts to the actual distribution of the data.

Reference

This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.

[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).