Diversity
DataEval API
- dataeval.metrics.diversity(class_labels: Sequence[int], metadata: List[Dict], method: Literal['shannon', 'simpson'] = 'simpson') DiversityOutput
Compute diversity for discrete/categorical variables and, through standard histogram binning, for continuous variables.
diversity = 1 implies that samples are evenly distributed across a particular factor diversity = 0 implies that all samples belong to one category/bin
- Parameters:
class_labels (Sequence[int]) – List of class labels for each image
metadata (List[Dict]) – List of metadata factors for each image
metric (Literal["shannon", "simpson"], default "simpson") – string variable indicating which diversity index should be used. Permissible values include “simpson” and “shannon”
Notes
For continuous variables, histogram bins are chosen automatically. See numpy.histogram for details.
- Returns:
Diversity index per column of self.data or each factor in self.names
- Return type:
DiversityOutput
See also
numpy.histogram
- dataeval.metrics.diversity_classwise(class_labels: Sequence[int], metadata: List[Dict], method: Literal['shannon', 'simpson'] = 'simpson') DiversityOutput
Compute diversity for discrete/categorical variables and, through standard histogram binning, for continuous variables.
We define diversity as a normalized form of the inverse Simpson diversity index.
diversity = 1 implies that samples are evenly distributed across a particular factor diversity = 1/num_categories implies that all samples belong to one category/bin
- Parameters:
class_labels (Sequence[int]) – List of class labels for each image
metadata (List[Dict]) – List of metadata factors for each image
Notes
For continuous variables, histogram bins are chosen automatically. See numpy.histogram for details.
The expression is undefined for q=1, but it approaches the Shannon entropy in the limit.
If there is only one category, the diversity index takes a value of 1 = 1/N = 1/1. Entropy will take a value of 0.
- Returns:
Diversity index [n_class x n_factor]
- Return type:
DiversityOutput
See also
numpy.histogram