balance#

Balance and classwise balance are metrics that measure distributional correlation between metadata factors and class label. Balance and classwise balance can indicate opportunities for shortcut learning and disproportionate dataset sampling with respect to class labels or between metadata factors.

dataeval.metrics.bias.balance(metadata: MetadataOutput, num_neighbors: int = 5) BalanceOutput#

Mutual information (MI) between factors (class label, metadata, label/image properties)

Parameters:

metadata (MetadataOutput) – Output after running metadata_preprocessing

Returns:

(num_factors+1) x (num_factors+1) estimate of mutual information between num_factors metadata factors and class label. Symmetry is enforced.

Return type:

BalanceOutput

Note

We use mutual_info_classif from sklearn since class label is categorical. mutual_info_classif outputs are consistent up to O(1e-4) and depend on a random seed. MI is computed differently for categorical and continuous variables.

Example

Return balance (mutual information) of factors with class_labels

>>> bal = balance(metadata)
>>> bal.balance
array([0.9999982 , 0.2494567 , 0.02994455, 0.13363788, 0.        ,
       0.        ])

Return intra/interfactor balance (mutual information)

>>> bal.factors
array([[0.99999935, 0.31360499, 0.26925848, 0.85201924, 0.36653548],
       [0.31360499, 0.99999856, 0.09725766, 0.15836905, 1.98031993],
       [0.26925848, 0.09725766, 0.99999846, 0.03713108, 0.01544656],
       [0.85201924, 0.15836905, 0.03713108, 0.47450653, 0.25509664],
       [0.36653548, 1.98031993, 0.01544656, 0.25509664, 1.06260686]])

Return classwise balance (mutual information) of factors with individual class_labels

>>> bal.classwise
array([[0.9999982 , 0.2494567 , 0.02994455, 0.13363788, 0.        ,
        0.        ],
       [0.9999982 , 0.2494567 , 0.02994455, 0.13363788, 0.        ,
        0.        ]])

See also

sklearn.feature_selection.mutual_info_classif, sklearn.feature_selection.mutual_info_regression, sklearn.metrics.mutual_info_score