dataeval.core.coverage_adaptive

dataeval.core.coverage_adaptive(embeddings, num_observations, percent, force_unit_interval=False)

Evaluate coverage using an adaptive radius calculation method.

This method calculates a data-adaptive coverage radius based on the distribution of critical value radii, selecting the top percentage of observations as uncovered.

Parameters:
embeddings : Array2D[float]

Dataset embeddings as unit interval [0, 1]. Can be a 2D list, array-like object, or tensor. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.

num_observations : int

Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.

percent : float

Percent of observations to be considered uncovered. Should be between 0 and 1.

force_unit_interval : bool, default False

If True, embeddings will be automatically rescaled to the unit interval [0, 1]. If False, a ValueError is raised if embeddings are outside [0, 1].

Returns:

Mapping with keys:

  • uncovered_indices: NDArray[np.intp] - Array of indices for uncovered observations

  • critical_value_radii: NDArray[np.float64] - Array of critical value radii for each observation

  • coverage_radius: float - The adaptive radius threshold for coverage

Return type:

CoverageResult

Raises:
  • ValueError – If embeddings are not unit interval [0-1] and force_unit_interval is False

  • ValueError – If length of embeddings is less than or equal to num_observations

Notes

Embeddings should be on the unit interval [0-1].

The adaptive method determines the coverage radius based on the data distribution, selecting the top percent of observations with the largest critical value radii as uncovered. This approach is more flexible than the naive method and adapts to the actual distribution of the data.

Reference

This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.

[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).