dataeval.core.coverage_naive¶
-
dataeval.core.coverage_naive(embeddings, num_observations, force_unit_interval=
False)¶ Evaluate coverage using a naive radius calculation method.
This method calculates a fixed coverage radius based on the dimensionality of the embedding space and the desired number of observations per covered region.
- Parameters:¶
- embeddings : Array2D[float]¶
Dataset image embeddings as unit interval [0, 1]. Can be a 2D list, array-like object, or tensor. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.
- num_observations : int¶
Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.
- force_unit_interval : bool, default False¶
If True, embeddings will be automatically rescaled to the unit interval [0, 1]. If False, a ValueError is raised if embeddings are outside [0, 1].
- Returns:¶
Mapping with keys:
uncovered_indices: NDArray[np.intp] - Array of indices for uncovered observations
critical_value_radii: NDArray[np.float64] - Array of critical value radii for each observation
coverage_radius: float - The radius threshold for coverage
- Return type:¶
- Raises:¶
ValueError – If embeddings are not unit interval [0-1] and force_unit_interval is False
ValueError – If length of embeddings is less than or equal to num_observations
Notes
Embeddings should be on the unit interval [0-1].
The naive method calculates a fixed radius based on the formula: r = (1/√π) * ((2 * k * Γ(d/2 + 1)) / n)^(1/d) where k is num_observations, d is the dimensionality, and n is the number of samples.
Reference¶
This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.
[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).