dataeval.core.divergence_fnn¶
- dataeval.core.divergence_fnn(emb_a, emb_b)¶
Calculate the divergence by counting label disagreements between nearest neighbors.
Counts the label disagreements between nearest neighbors in the datasets.
- Parameters:¶
- emb_a : ArrayLike, shape - (N, P)¶
Image embeddings in an ArrayLike format to compare. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.
- emb_b : ArrayLike, shape - (N, P)¶
Image embeddings in an ArrayLike format to compare. Function expects the data to have 2 dimensions, N number of observations in a P-dimensional space.
- Returns:¶
Mapping with keys:
divergence: float - The divergence value between 0.0 and 1.0
errors: int - The number of label disagreements
- Return type:¶
Examples
Return divergence of two datasets (0-no divergence, 1-complete divergence)
>>> import sklearn.datasets as dsets >>> from dataeval.core import divergence_fnn >>> datasetA = dsets.make_blobs( ... n_samples=50, centers=np.array([(-1, -1), (1, 1)]), cluster_std=0.3, random_state=712 ... )[0] >>> datasetB = ( ... dsets.make_blobs(n_samples=50, centers=np.array([(-0.5, -0.5), (1, 1)]), cluster_std=0.3, random_state=712)[ ... 0 ... ] ... + 0.05 ... ) >>> datasetC = dsets.make_blobs( ... n_samples=50, centers=np.array([(-0.5, 0.5), (1, -1)]), cluster_std=0.3, random_state=712 ... )[0]Overlapping datasets - divergence == 0:
>>> divergence_fnn(datasetA, datasetB) {'divergence': 0.0, 'errors': 54}Completely separated datasets - divergence == 1:
>>> divergence_fnn(datasetA, datasetC) {'divergence': 1.0, 'errors': 0}