dataeval.core.feature_distance

dataeval.core.feature_distance(continuous_data_1, continuous_data_2)

Measures the feature-wise distance between two continuous distributions and computes a p-value to evaluate its significance.

Uses the Earth Mover’s Distance and the Kolmogorov-Smirnov two-sample test, featurewise.

Parameters:
continuous_data_1 : Array1D[float] | Array2D[float]

Array of values to be used as reference. Can be a 1D or 2D list, or array-like object.

continuous_data_2 : Array1D[float] | Array2D[float]

Array of values to be compare with the reference. Can be a 1D or 2D list, or array-like object.

Returns:

Sequence of mappings, one per feature, each with keys:

  • statistic: float - The Kolmogorov-Smirnov test statistic

  • location: float - The normalized location where the KS statistic was achieved

  • dist: float - The Earth Mover’s Distance between distributions

  • p_value: float - The p-value from the KS test

Return type:

Sequence[FeatureDistanceResult]

See also

Earth, Kolmogorov-Smirnov