Balance
DataEval API
- dataeval.metrics.balance(class_labels: Sequence[int], metadata: List[Dict], num_neighbors: int = 5) BalanceOutput
Mutual information (MI) between factors (class label, metadata, label/image properties)
- Parameters:
class_labels (Sequence[int]) – List of class labels for each image
metadata (List[Dict]) – List of metadata factors for each image
num_neighbors (int, default 5) – Number of nearest neighbors to use for computing MI between discrete and continuous variables.
- Returns:
(num_factors+1) x (num_factors+1) estimate of mutual information between num_factors metadata factors and class label. Symmetry is enforced.
- Return type:
BalanceOutput
Notes
We use mutual_info_classif from sklearn since class label is categorical. mutual_info_classif outputs are consistent up to O(1e-4) and depend on a random seed. MI is computed differently for categorical and continuous variables, and we attempt to infer whether a variable is categorical by the fraction of unique values in the dataset.
See also
sklearn.feature_selection.mutual_info_classif,sklearn.feature_selection.mutual_info_regression,sklearn.metrics.mutual_info_score
- dataeval.metrics.balance_classwise(class_labels: Sequence[int], metadata: List[Dict], num_neighbors: int = 5) BalanceOutput
Compute mutual information (analogous to correlation) between metadata factors (class label, metadata, label/image properties) with individual class labels.
- Parameters:
class_labels (Sequence[int]) – List of class labels for each image
metadata (List[Dict]) – List of metadata factors for each image
num_neighbors (int, default 5) – Number of nearest neighbors to use for computing MI between discrete and continuous variables.
Notes
We use mutual_info_classif from sklearn since class label is categorical. mutual_info_classif outputs are consistent up to O(1e-4) and depend on a random seed. MI is computed differently for categorical and continuous variables, so we have to specify with is_categorical.
- Returns:
(num_classes x num_factors) estimate of mutual information between num_factors metadata factors and individual class labels.
- Return type:
BalanceOutput
See also
sklearn.feature_selection.mutual_info_classif,sklearn.feature_selection.mutual_info_regression,sklearn.metrics.mutual_info_score,compute_mutual_information