dataeval.core.coverage_naive¶

dataeval.core.coverage_naive(embeddings, num_observations, force_unit_interval=False)¶

Evaluate coverage using a naive radius calculation method.

This method calculates a fixed coverage radius based on the dimensionality of the embedding space and the desired number of observations per covered region.

Parameters:¶

embeddings : Array2D[float]¶: Dataset image embeddings as unit interval [0, 1]. Can be a 2D list, array-like object, or tensor. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.
num_observations : int¶: Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.
force_unit_interval : bool, default False¶: If True, embeddings will be automatically rescaled to the unit interval [0, 1]. If False, a ValueError is raised if embeddings are outside [0, 1].

Returns:¶

Mapping with keys:

uncovered_indices: NDArray[np.intp] - Array of indices for uncovered observations
critical_value_radii: NDArray[np.float64] - Array of critical value radii for each observation
coverage_radius: float - The radius threshold for coverage

Return type:¶

CoverageResult

Raises:¶

ValueError – If embeddings are not unit interval [0-1] and force_unit_interval is False
ValueError – If length of embeddings is less than or equal to num_observations

Notes

Embeddings should be on the unit interval [0-1].

The naive method calculates a fixed radius based on the formula: r = (1/√π) * ((2 * k * Γ(d/2 + 1)) / n)^(1/d) where k is num_observations, d is the dimensionality, and n is the number of samples.

Reference¶

This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.

[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).