Diversity

DataEval API

dataeval.metrics.diversity(class_labels: Sequence[int], metadata: List[Dict], method: Literal['shannon', 'simpson'] = 'simpson') DiversityOutput

Compute diversity for discrete/categorical variables and, through standard histogram binning, for continuous variables.

diversity = 1 implies that samples are evenly distributed across a particular factor diversity = 0 implies that all samples belong to one category/bin

Parameters:
  • class_labels (Sequence[int]) – List of class labels for each image

  • metadata (List[Dict]) – List of metadata factors for each image

  • metric (Literal["shannon", "simpson"], default "simpson") – string variable indicating which diversity index should be used. Permissible values include “simpson” and “shannon”

Notes

  • For continuous variables, histogram bins are chosen automatically. See numpy.histogram for details.

Returns:

Diversity index per column of self.data or each factor in self.names

Return type:

DiversityOutput

See also

numpy.histogram

dataeval.metrics.diversity_classwise(class_labels: Sequence[int], metadata: List[Dict], method: Literal['shannon', 'simpson'] = 'simpson') DiversityOutput

Compute diversity for discrete/categorical variables and, through standard histogram binning, for continuous variables.

We define diversity as a normalized form of the inverse Simpson diversity index.

diversity = 1 implies that samples are evenly distributed across a particular factor diversity = 1/num_categories implies that all samples belong to one category/bin

Parameters:
  • class_labels (Sequence[int]) – List of class labels for each image

  • metadata (List[Dict]) – List of metadata factors for each image

Notes

  • For continuous variables, histogram bins are chosen automatically. See numpy.histogram for details.

  • The expression is undefined for q=1, but it approaches the Shannon entropy in the limit.

  • If there is only one category, the diversity index takes a value of 1 = 1/N = 1/1. Entropy will take a value of 0.

Returns:

Diversity index [n_class x n_factor]

Return type:

DiversityOutput

See also

numpy.histogram