dataeval.selection.ClassBalance¶
-
class dataeval.selection.ClassBalance(method, num_samples=
None, background_class=None, num_empty=None, aggregation_func='mean', oversample_factor=1.0, minimize_duplicates=False)¶ Select a balanced subset of images based on class distribution.
This selection strategy balances class representation in datasets for classification, object detection, and segmentation tasks. It supports two balancing methods: global (weighted sampling by inverse class frequency) and interclass (equal samples per class).
- Parameters:¶
- method : {'global', 'interclass'}¶
Balancing strategy to use: - ‘global’: Sample images with probability proportional to inverse square root of class frequencies, giving higher weight to rare classes - ‘interclass’: Sample equal number of images from each class
- num_samples : int or None, optional¶
Total number of samples to select. If None, returns dataset size worth of samples.
- background_class : int or str or None, optional¶
Class label to treat as background. For ‘global’ method, background gets frequency 1.0. For ‘interclass’ method, background is excluded from sampling.
- num_empty : int or float or None, optional¶
Number of empty images (no targets) to include. If float, treated as proportion of dataset size. If None, no special handling for empty images.
- aggregation_func : {'mean', 'max'}, default='mean'¶
How to aggregate repeat factors when image contains multiple classes. Only used in ‘global’ method.
- oversample_factor : float, default=1.0¶
Scaling factor for class repeat factors in ‘global’ method. Higher values increase oversampling of rare classes.
- minimize_duplicates : bool, default=False¶
If True, use probability scoring to reduce duplicate selections in ‘interclass’ method when sampling with replacement.
Notes
Empty images (those with no detection/segmentation targets) are tracked separately from class labels
The selection may contain duplicate indices depending on method and parameters
Uses numpy random number generator seeded from dataeval config