dataeval.metrics.bias.parity
============================

.. py:function:: dataeval.metrics.bias.parity(metadata)

   Calculate chi-square statistics to assess the linear relationship between multiple factors
   and class labels.

   This function computes the chi-square statistic for each metadata factor to determine if there is
   a significant relationship between the factor values and class labels. The chi-square statistic is
   only valid for linear relationships. If non-linear relationships exist, use `balance`.

   :param metadata: Preprocessed metadata from :func:`dataeval.utils.metadata.preprocess`
   :type metadata: Metadata

   :returns: Arrays of length (num_factors) whose (i)th element corresponds to the
             chi-square score and :term:`p-value<P-Value>` for the relationship between factor i and
             the class labels in the dataset.
   :rtype: ParityOutput[NDArray[np.float64]]

   :raises Warning: If any cell in the contingency matrix has a value between 0 and 5, a warning is issued because this can
       lead to inaccurate chi-square calculations. It is recommended to ensure that each label co-occurs with
       factor values either 0 times or at least 5 times.

   .. note::

      - A high score with a low p-value suggests that a metadata factor is strongly correlated with a class label.
      - The function creates a contingency matrix for each factor, where each entry represents the frequency of a
        specific factor value co-occurring with a particular class label.
      - Rows containing only zeros in the contingency matrix are removed before performing the chi-square test
        to prevent errors in the calculation.

   .. seealso:: :obj:`balance`

   .. rubric:: Examples

   Randomly creating some "continuous" and categorical variables using ``np.random.default_rng``

   >>> from dataeval.utils.metadata import preprocess
   >>> rng = np.random.default_rng(175)
   >>> labels = rng.choice([0, 1, 2], (100))
   >>> metadata_dict = [
   ...     {
   ...         "age": list(rng.choice([25, 30, 35, 45], (100))),
   ...         "income": list(rng.choice([50000, 65000, 80000], (100))),
   ...         "gender": list(rng.choice(["M", "F"], (100))),
   ...     }
   ... ]
   >>> continuous_factor_bincounts = {"age": 4, "income": 3}
   >>> metadata = preprocess(metadata_dict, labels, continuous_factor_bincounts)
   >>> parity(metadata)
   ParityOutput(score=array([7.35731943, 5.46711299, 0.51506212]), p_value=array([0.28906231, 0.24263543, 0.77295762]), metadata_names=['age', 'income', 'gender'])