dataeval.core.coverage_naive

dataeval.core.coverage_naive(embeddings, num_observations)

Evaluate coverage using a naive radius calculation method.

This method calculates a fixed coverage radius based on the dimensionality of the embedding space and the desired number of observations per covered region.

Parameters:
embeddings : Array

Dataset image embeddings as unit interval [0, 1]. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.

num_observations : int

Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.

Returns:

A tuple containing: - uncovered_indices : Array of indices for uncovered observations - critical_value_radii : Array of critical value radii for each observation - coverage_radius : The radius threshold for coverage

Return type:

tuple[NDArray[np.intp], NDArray[np.float64], float]

Raises:
  • ValueError – If embeddings are not unit interval [0-1]

  • ValueError – If length of embeddings is less than or equal to num_observations

Notes

Embeddings should be on the unit interval [0-1].

The naive method calculates a fixed radius based on the formula: r = (1/√π) * ((2 * k * Γ(d/2 + 1)) / n)^(1/d) where k is num_observations, d is the dimensionality, and n is the number of samples.

Reference

This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.

[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).