dataeval.metadata.metadata_distance

dataeval.metadata.metadata_distance(metadata1, metadata2)

Measures the feature-wise distance between two continuous metadata distributions and computes a p-value to evaluate its significance.

Uses the Earth Mover’s Distance and the Kolmogorov-Smirnov two-sample test, featurewise.

Parameters:
metadata1 : Metadata

Class containing continuous factor names and values to be used as reference

metadata2 : Metadata

Class containing continuous factor names and values to be compare with the reference

Returns:

A dictionary with keys corresponding to metadata feature names, and values that are KstestResult objects, as defined by scipy.stats.ks_2samp.

Return type:

dict[str, KstestResult]

See also

Earth, Kolmogorov-Smirnov

Note

This function only applies to the continuous data

Examples

>>> output = metadata_distance(metadata1, metadata2)
>>> list(output)
['time', 'altitude']
>>> output["time"]
MetadataKSResult(statistic=1.0, location=0.44354838709677413, dist=2.7, pvalue=0.0)