dataeval.core.coverage_naive¶

dataeval.core.coverage_naive(embeddings, num_observations)¶

Evaluate coverage using a naive radius calculation method.

This method calculates a fixed coverage radius based on the dimensionality of the embedding space and the desired number of observations per covered region.

Parameters:¶

embeddings : Array¶: Dataset image embeddings as unit interval [0, 1]. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.
num_observations : int¶: Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.

Returns:¶

A tuple containing: - uncovered_indices : Array of indices for uncovered observations - critical_value_radii : Array of critical value radii for each observation - coverage_radius : The radius threshold for coverage

Return type:¶

tuple[NDArray[np.intp], NDArray[np.float64], float]

Raises:¶

ValueError – If embeddings are not unit interval [0-1]
ValueError – If length of embeddings is less than or equal to num_observations

Notes

Embeddings should be on the unit interval [0-1].

The naive method calculates a fixed radius based on the formula: r = (1/√π) * ((2 * k * Γ(d/2 + 1)) / n)^(1/d) where k is num_observations, d is the dimensionality, and n is the number of samples.

Reference¶

This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.

[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).