dataeval.selection.ClassBalance

class dataeval.selection.ClassBalance(method, num_samples=None, background_class=None, num_empty=None, aggregation_func='mean', oversample_factor=1.0, minimize_duplicates=False)

Select a balanced subset of images based on class distribution.

This selection strategy balances class representation in datasets for classification, object detection, and segmentation tasks. It supports two balancing methods: global (weighted sampling by inverse class frequency) and interclass (equal samples per class).

Parameters:
method : {'global', 'interclass'}

Balancing strategy to use: - ‘global’: Sample images with probability proportional to inverse square root of class frequencies, giving higher weight to rare classes - ‘interclass’: Sample equal number of images from each class

num_samples : int or None, optional

Total number of samples to select. If None, returns dataset size worth of samples.

background_class : int or str or None, optional

Class label to treat as background. For ‘global’ method, background gets frequency 1.0. For ‘interclass’ method, background is excluded from sampling.

num_empty : int or float or None, optional

Number of empty images (no targets) to include. If float, treated as proportion of dataset size. If None, no special handling for empty images.

aggregation_func : {'mean', 'max'}, default='mean'

How to aggregate repeat factors when image contains multiple classes. Only used in ‘global’ method.

oversample_factor : float, default=1.0

Scaling factor for class repeat factors in ‘global’ method. Higher values increase oversampling of rare classes.

minimize_duplicates : bool, default=False

If True, use probability scoring to reduce duplicate selections in ‘interclass’ method when sampling with replacement.

Notes

  • Empty images (those with no detection/segmentation targets) are tracked separately from class labels

  • The selection may contain duplicate indices depending on method and parameters

  • Uses numpy random number generator seeded from dataeval config