dataeval.core.coverage_naive¶
- dataeval.core.coverage_naive(embeddings, num_observations)¶
Evaluate coverage using a naive radius calculation method.
This method calculates a fixed coverage radius based on the dimensionality of the embedding space and the desired number of observations per covered region.
- Parameters:¶
- embeddings : Array¶
Dataset image embeddings as unit interval [0, 1]. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.
- num_observations : int¶
Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.
- Returns:¶
A tuple containing: - uncovered_indices : Array of indices for uncovered observations - critical_value_radii : Array of critical value radii for each observation - coverage_radius : The radius threshold for coverage
- Return type:¶
tuple[NDArray[np.intp], NDArray[np.float64], float]
- Raises:¶
ValueError – If embeddings are not unit interval [0-1]
ValueError – If length of embeddings is less than or equal to num_observations
Notes
Embeddings should be on the unit interval [0-1].
The naive method calculates a fixed radius based on the formula: r = (1/√π) * ((2 * k * Γ(d/2 + 1)) / n)^(1/d) where k is num_observations, d is the dimensionality, and n is the number of samples.
Reference¶
This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.
[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).