dataeval.core.coverage_adaptive¶

dataeval.core.coverage_adaptive(embeddings, num_observations, percent, force_unit_interval=False)¶

Evaluate coverage using an adaptive radius calculation method.

This method calculates a data-adaptive coverage radius based on the distribution of critical value radii, selecting the top percentage of observations as uncovered.

Parameters:¶

embeddings : Array2D[float]¶: Dataset embeddings as unit interval [0, 1]. Can be a 2D list, array-like object, or tensor. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.
num_observations : int¶: Number of observations required in order to be covered. [1] suggests that a minimum of 20-50 samples is necessary.
percent : float¶: Percent of observations to be considered uncovered. Should be between 0 and 1.
force_unit_interval : bool, default False¶: If True, embeddings will be automatically rescaled to the unit interval [0, 1]. If False, a ValueError is raised if embeddings are outside [0, 1].

Returns:¶

Mapping with keys:

uncovered_indices: NDArray[np.intp] - Array of indices for uncovered observations
critical_value_radii: NDArray[np.float64] - Array of critical value radii for each observation
coverage_radius: float - The adaptive radius threshold for coverage

Return type:¶

CoverageResult

Raises:¶

ValueError – If embeddings are not unit interval [0-1] and force_unit_interval is False
ValueError – If length of embeddings is less than or equal to num_observations

Notes

Embeddings should be on the unit interval [0-1].

The adaptive method determines the coverage radius based on the data distribution, selecting the top percent of observations with the largest critical value radii as uncovered. This approach is more flexible than the naive method and adapts to the actual distribution of the data.

Reference¶

This implementation is based on https://dl.acm.org/doi/abs/10.1145/3448016.3457315.

[1] Seymour Sudman. 1976. Applied sampling. Academic Press New York (1976).