dataeval.metrics.bias.diversity#
- dataeval.metrics.bias.diversity(metadata, method='simpson')#
Compute diversity and classwise diversity for discrete/categorical variables and, through standard histogram binning, for continuous variables.
We define diversity as a normalized form of the inverse Simpson diversity index.
diversity = 1 implies that samples are evenly distributed across a particular factor diversity = 0 implies that all samples belong to one category/bin
- Parameters:
metadata (Metadata) – Preprocessed metadata from
dataeval.utils.metadata.preprocess()method (Literal['simpson', 'shannon'])
- Return type:
Note
The expression is undefined for q=1, but it approaches the Shannon entropy in the limit.
If there is only one category, the diversity index takes a value of 0.
- Returns:
Diversity index per column of self.data or each factor in self.names and classwise diversity [n_class x n_factor]
- Return type:
- Parameters:
metadata (dataeval.utils.metadata.Metadata)
method (Literal['simpson', 'shannon'])
Example
Compute Simpson diversity index of metadata and class labels
>>> div_simp = diversity(metadata, method="simpson") >>> div_simp.diversity_index array([0.6 , 0.80882353, 1. , 0.8 ])
>>> div_simp.classwise array([[0.5 , 0.8 , 0.8 ], [0.63043478, 0.97560976, 0.52830189]])
Compute Shannon diversity index of metadata and class labels
>>> div_shan = diversity(metadata, method="shannon") >>> div_shan.diversity_index array([0.81127812, 0.9426312 , 1. , 0.91829583])
>>> div_shan.classwise array([[0.68260619, 0.91829583, 0.91829583], [0.81443569, 0.99107606, 0.76420451]])
See also
scipy.stats.entropy