dataeval.metrics.bias.completeness¶

dataeval.metrics.bias.completeness(embeddings, quantiles)¶

Calculate the fraction of boxes in a grid defined by quantiles that contain at least one data point. Also returns the center coordinates of each empty box.

Parameters:¶

embeddings : Array¶: Embedded dataset (or other low-dimensional data) (nxp)
quantiles : int¶: number of quantile values to use for partitioning each dimension e.g., 1 would create a grid of 2^p boxes, 2, 3^p etc..

Returns:¶

fraction_filled: float - Fraction of boxes that contain at least one data point
empty_box_centers: List[np.ndarray] - List of coordinates for centers of empty boxes

Return type:¶

CompletenessOutput

Raises:¶

ValueError – If embeddings are too high-dimensional (>10)
ValueError – If there are too many quantiles (>2)
ValueError – If embedding is invalid shape

Example

>>> embs = np.array([[1, 0], [0, 1], [1, 1]])
>>> quantiles = 1
>>> result = completeness(embs, quantiles)
>>> result.fraction_filled
0.75

Reference¶

This implementation is based on https://arxiv.org/abs/2002.03147.

[1] Byun, Taejoon, and Sanjai Rayadurgam. “Manifold for Machine Learning Assurance.” Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering