dataeval.core.coverage_adaptive¶
- dataeval.core.coverage_adaptive(embeddings, num_observations, percent)¶
Evaluate coverage using an adaptive radius calculation method.
This method calculates a data-adaptive coverage radius based on the distribution of critical value radii, selecting the top percentage of observations as uncovered.
- Parameters:¶
- embeddings : Array¶
Dataset embeddings as unit interval [0, 1]. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.
- num_observations : int¶
Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.
- percent : float¶
Percent of observations to be considered uncovered. Should be between 0 and 1.
- Returns:¶
A tuple containing: - uncovered_indices : Array of indices for uncovered observations - critical_value_radii : Array of critical value radii for each observation - coverage_radius : The adaptive radius threshold for coverage
- Return type:¶
tuple[NDArray[np.intp], NDArray[np.float64], float]
- Raises:¶
ValueError – If embeddings are not unit interval [0-1]
ValueError – If length of embeddings is less than or equal to num_observations
Notes
Embeddings should be on the unit interval [0-1].
The adaptive method determines the coverage radius based on the data distribution, selecting the top percent of observations with the largest critical value radii as uncovered. This approach is more flexible than the naive method and adapts to the actual distribution of the data.
Reference¶
This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.
[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).