dataeval.metrics.bias.label_parity¶

dataeval.metrics.bias.label_parity(expected_labels, observed_labels, num_classes=None)¶

Calculate the chi-square statistic to assess the parity between expected and observed label distributions.

This function computes the frequency distribution of classes in both expected and observed labels, normalizes the expected distribution to match the total number of observed labels, and then calculates the chi-square statistic to determine if there is a significant difference between the two distributions.

Parameters:¶

expected_labels : ArrayLike¶: List of class labels in the expected dataset
observed_labels : ArrayLike¶: List of class labels in the observed dataset
num_classes : int or None, default None¶: The number of unique classes in the datasets. If not provided, the function will infer it from the set of unique labels in expected_labels and observed_labels

Returns:¶

chi-squared score and :term`P-Value` of the test

Return type:¶

LabelParityOutput

Raises:¶

ValueError – If expected label distribution is empty, is all zeros, or if there is a mismatch in the number of unique classes between the observed and expected distributions.

Note

Providing num_classes can be helpful if there are classes with zero instances in one of the distributions.
The function first validates the observed distribution and normalizes the expected distribution so that it has the same total number of labels as the observed distribution.
It then performs a Chi-Square Test of Independence to determine if there is a statistically significant difference between the observed and expected label distributions.
This function acts as an interface to the scipy.stats.chisquare method, which is documented at https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html

Examples

Randomly creating some label distributions using np.random.default_rng

>>> rng = np.random.default_rng(175)
>>> expected_labels = rng.choice([0, 1, 2, 3, 4], (100))
>>> observed_labels = rng.choice([2, 3, 0, 4, 1], (100))
>>> label_parity(expected_labels, observed_labels)
LabelParityOutput(score=14.007374204742625, p_value=0.0072715574616218)