dataeval.core.completeness¶
- dataeval.core.completeness(embeddings)¶
Measure the dimensional utilization of embeddings.
Completeness measures how effectively the data explores all available dimensions in its embedding space. This implementation uses a directional diversity approach based on eigenvalue entropy, which is more robust for high-dimensional data than traditional box-counting or neighbor-distance-based methods. The isotropy measure is similar, but measures directional diversity relative to the actual space spanned by the embeddings, rather than to the entire ambient space.
- Parameters:¶
- embeddings : Array¶
Array of image embeddings, shape (n_samples, n_dimensions). Can be a 2D list, array-like object, or tensor.
- Returns:¶
Mapping with keys:
completeness: float - Completeness score between 0 and 1
isotropy: float - Isotropy score between 0 and 1
nearest_neighbor_pairs: Sequence[tuple[int, int]] - Pairs of point indices and their nearest neighbors, sorted by decreasing distance
- Return type:¶
- Raises:¶
ValueError – If embeddings are not 2D
ValueError – If embeddings have a zero dimension
Examples
Well-spread data across 3 dimensions:
>>> rng = np.random.default_rng(42) >>> embeddings = rng.random((50, 3)) >>> result = completeness(embeddings) >>> result["completeness"] 0.9963684026790749 >>> result["isotropy"] 0.9865994134108708Single plane data across 3 dimensions:
>>> directions = rng.normal(size=(2, 3)) # 2 random lines >>> directions /= np.linalg.norm(directions, axis=1, keepdims=True) >>> t = np.random.uniform(0, 0.5, (len(directions), 25, 1)) >>> embeddings = ([0.5] * 3 + t * directions[:, np.newaxis, :]).reshape(-1, 3) >>> result = completeness(embeddings) >>> result["completeness"] 0.6001089325287554 >>> result["isotropy"] 0.40470070513943307Completeness can be less than isotropy:
>>> X_low = rng.normal(size=(50, 2)) >>> Q, _ = np.linalg.qr(rng.normal(size=(3, 2))) >>> embeddings = X_low @ Q.T >>> result = completeness(embeddings) >>> result["completeness"] # penalized by unused ambient dimension 0.6844547029590969 >>> result["isotropy"] # close to 1, isotropic within 2D subspace 0.9869106459012913