Parity

How-To Guides

Check out this how to to begin using the parity metric.

Class Label Analysis Tutorial

DataEval API

dataeval.metrics.parity(expected_labels: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], observed_labels: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], num_classes: int | None = None) ParityOutput

Perform a one-way chi-squared test between observation frequencies and expected frequencies that tests the null hypothesis that the observed data has the expected frequencies.

This function acts as an interface to the scipy.stats.chisquare method, which is documented at https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html

Parameters:
  • expected_labels (ArrayLike) – List of class labels in the expected dataset

  • observed_labels (ArrayLike) – List of class labels in the observed dataset

  • num_classes (Optional[int]) – The number of unique classes in the datasets. If this is not specified, it will be inferred from the set of unique labels in expected_labels and observed_labels

Returns:

chi-squared score and p-value of the test

Return type:

ParityOutput[np.float64]

Raises:

ValueError – If x is empty

dataeval.metrics.parity_metadata(data_factors: Mapping[str, _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]], continuous_factor_bincounts: Dict[str, int] | None = None) ParityMetadataOutput

Evaluates the statistical independence of metadata factors from class labels. This performs a chi-square test, which provides a score and a p-value for statistical independence between each pair of a metadata factor and a class label. A high score with a low p-value suggests that a metadata factor is strongly correlated with a class label.

Parameters:
  • data_factors (Mapping[str, ArrayLike]) – The dataset factors, which are per-image attributes including class label and metadata. Each key of dataset_factors is a factor, whose value is the per-image factor values.

  • continuous_factor_bincounts (Optional[Dict[str, int]], default None) – The factors in data_factors that have continuous values and the array of bin counts to discretize values into. All factors are treated as having discrete values unless they are specified as keys in this dictionary. Each element of this array must occur as a key in data_factors.

Returns:

Arrays of length (num_factors) whose (i)th element corresponds to the chi-square score and p-value for the relationship between factor i and the class labels in the dataset.

Return type:

ParityOutput[NDArray[np.float64]]