dataeval.metrics.estimators.divergence¶
-
dataeval.metrics.estimators.divergence(emb_a, emb_b, method=
'FNN')¶ Calculates the divergence by counting the number of “between dataset” edges in the minimum spanning tree.
- Parameters:¶
- emb_a : ArrayLike, shape - (N, P)¶
Image embeddings in an ArrayLike format to compare. Function expects the data to have 2 dimensions, N number of observations in a P-dimensionial space.
- emb_b : ArrayLike, shape - (N, P)¶
Image embeddings in an ArrayLike format to compare. Function expects the data to have 2 dimensions, N number of observations in a P-dimensionial space.
- method : Literal["MST, "FNN"], default "FNN"¶
Method used to estimate dataset divergence
- Returns:¶
The divergence value (0.0..1.0) and the number of differing edges between the datasets
- Return type:¶
Note
The divergence value indicates how similar the 2 datasets are with 0 indicating approximately identical data distributions.
References
For more information about this divergence, its formal definition, and its associated estimators see https://arxiv.org/abs/1412.6534.
Examples
Evaluate the datasets:
>>> divergence(datasetA, datasetB) DivergenceOutput(divergence=0.28, errors=36)