balance#
Balance and classwise balance are metrics that measure distributional correlation between metadata factors and class label. Balance and classwise balance can indicate opportunities for shortcut learning and disproportionate dataset sampling with respect to class labels or between metadata factors.
- dataeval.metrics.bias.balance(metadata: MetadataOutput, num_neighbors: int = 5) BalanceOutput#
Mutual information (MI) between factors (class label, metadata, label/image properties)
- Parameters:
metadata (MetadataOutput) – Output after running metadata_preprocessing
- Returns:
(num_factors+1) x (num_factors+1) estimate of mutual information between num_factors metadata factors and class label. Symmetry is enforced.
- Return type:
Note
We use mutual_info_classif from sklearn since class label is categorical. mutual_info_classif outputs are consistent up to O(1e-4) and depend on a random seed. MI is computed differently for categorical and continuous variables.
Example
Return balance (mutual information) of factors with class_labels
>>> bal = balance(metadata) >>> bal.balance array([0.9999982 , 0.2494567 , 0.02994455, 0.13363788, 0. , 0. ])
Return intra/interfactor balance (mutual information)
>>> bal.factors array([[0.99999935, 0.31360499, 0.26925848, 0.85201924, 0.36653548], [0.31360499, 0.99999856, 0.09725766, 0.15836905, 1.98031993], [0.26925848, 0.09725766, 0.99999846, 0.03713108, 0.01544656], [0.85201924, 0.15836905, 0.03713108, 0.47450653, 0.25509664], [0.36653548, 1.98031993, 0.01544656, 0.25509664, 1.06260686]])
Return classwise balance (mutual information) of factors with individual class_labels
>>> bal.classwise array([[0.9999982 , 0.2494567 , 0.02994455, 0.13363788, 0. , 0. ], [0.9999982 , 0.2494567 , 0.02994455, 0.13363788, 0. , 0. ]])
See also
sklearn.feature_selection.mutual_info_classif,sklearn.feature_selection.mutual_info_regression,sklearn.metrics.mutual_info_score