dataeval.metrics.bias.diversity =============================== .. py:function:: dataeval.metrics.bias.diversity(metadata, method = 'simpson') Compute :term:`diversity` and classwise diversity for discrete/categorical variables and, through standard histogram binning, for continuous variables. We define diversity as a normalized form of the inverse Simpson diversity index. diversity = 1 implies that samples are evenly distributed across a particular factor diversity = 0 implies that all samples belong to one category/bin :param metadata: Preprocessed metadata from :func:`dataeval.utils.metadata.preprocess` :type metadata: Metadata .. note:: - The expression is undefined for q=1, but it approaches the Shannon entropy in the limit. - If there is only one category, the diversity index takes a value of 0. :returns: Diversity index per column of self.data or each factor in self.names and classwise diversity [n_class x n_factor] :rtype: DiversityOutput .. rubric:: Example Compute Simpson diversity index of metadata and class labels >>> div_simp = diversity(metadata, method="simpson") >>> div_simp.diversity_index array([0.6 , 0.80882353, 1. , 0.8 ]) >>> div_simp.classwise array([[0.5 , 0.8 , 0.8 ], [0.63043478, 0.97560976, 0.52830189]]) Compute Shannon diversity index of metadata and class labels >>> div_shan = diversity(metadata, method="shannon") >>> div_shan.diversity_index array([0.81127812, 0.9426312 , 1. , 0.91829583]) >>> div_shan.classwise array([[0.68260619, 0.91829583, 0.91829583], [0.81443569, 0.99107606, 0.76420451]]) .. seealso:: :obj:`scipy.stats.entropy`