Divergence
- class dataeval.metrics.Divergence(method: Literal['MST', 'FNN'] = 'MST')
Calculates the estimated HP divergence between two datasets
- Parameters:
method (Literal["MST, "FNN"], default "MST") – Method used to estimate dataset divergence
Warning
MST is very slow in this implementation, this is unlike matlab where they have comparable speeds Overall, MST takes ~25x LONGER!! Source of slowdown: conversion to and from CSR format adds ~10% of the time diff between 1nn and scipy mst function the remaining 90%
References
For more information about this divergence, its formal definition, and its associated estimators see https://arxiv.org/abs/1412.6534.
Examples
Initialize the Divergence class:
>>> divert = Divergence()
Specify the method:
>>> divert = Divergence(method="FNN")
- evaluate(data_a: ArrayLike, data_b: ArrayLike) Dict[str, Any]
Calculates the divergence and any errors between the datasets
- Parameters:
data_a (ArrayLike, shape - (N, P)) – A dataset in an ArrayLike format to compare. Function expects the data to have 2 dimensions, N number of observations in a P-dimesionial space.
data_b (ArrayLike, shape - (N, P)) – A dataset in an ArrayLike format to compare. Function expects the data to have 2 dimensions, N number of observations in a P-dimesionial space.
- Returns:
- divergencefloat
divergence value between 0.0 and 1.0
- errorint
the number of differing edges between the datasets
- Return type:
Dict[str, Any]
Notes
The divergence value indicates how similar the 2 datasets are with 0 indicating approximately identical data distributions.
Examples
Evaluate the datasets:
>>> divert.evaluate(datasetA, datasetB) {'divergence': 0.28, 'error': 36.0}