dataeval.metrics.estimators.divergence ====================================== .. py:function:: dataeval.metrics.estimators.divergence(data_a, data_b, method = 'FNN') Calculates the :term`divergence` and any errors between the datasets :param data_a: A dataset in an ArrayLike format to compare. Function expects the data to have 2 dimensions, N number of observations in a P-dimensionial space. :type data_a: ArrayLike, shape - (N, P) :param data_b: A dataset in an ArrayLike format to compare. Function expects the data to have 2 dimensions, N number of observations in a P-dimensionial space. :type data_b: ArrayLike, shape - (N, P) :param method: Method used to estimate dataset :term:`divergence` :type method: Literal["MST, "FNN"], default "FNN" :returns: The divergence value (0.0..1.0) and the number of differing edges between the datasets :rtype: DivergenceOutput .. note:: The divergence value indicates how similar the 2 datasets are with 0 indicating approximately identical data distributions. .. warning:: MST is very slow in this implementation, this is unlike matlab where they have comparable speeds Overall, MST takes ~25x LONGER!! Source of slowdown: conversion to and from CSR format adds ~10% of the time diff between 1nn and scipy mst function the remaining 90% .. rubric:: References For more information about this divergence, its formal definition, and its associated estimators see https://arxiv.org/abs/1412.6534. .. rubric:: Examples Evaluate the datasets: >>> divergence(datasetA, datasetB) DivergenceOutput(divergence=0.28, errors=36)