dataeval.utils.metadata.preprocess¶
-
dataeval.utils.metadata.preprocess(raw_metadata, class_labels, continuous_factor_bins=
None, auto_bin_method='uniform_width', exclude=None)¶ Restructures the metadata to be in the correct format for the bias functions.
This identifies whether the incoming metadata is discrete or continuous, and whether the data is already binned or still needs binning. It accepts a list of dictionaries containing the per image metadata and automatically adjusts for multiple targets in an image.
- Parameters:¶
- raw_metadata : Iterable[Mapping[str, Any]]¶
Iterable collection of metadata dictionaries to flatten and merge.
- class_labels : ArrayLike or string¶
If arraylike, expects the labels for each image (image classification) or each object (object detection). If the labels are included in the metadata dictionary, pass in the key value.
- continuous_factor_bins : Mapping[str, int or Iterable[float]] or None, default None¶
User provided dictionary specifying how to bin the continuous metadata factors where the value is either an int to represent the number of bins, or a list of floats representing the edges for each bin.
- auto_bin_method : "uniform_width" or "uniform_count" or "clusters", default "uniform_width"¶
Method by which the function will automatically bin continuous metadata factors. It is recommended that the user provide the bins through the continuous_factor_bins.
- exclude : Iterable[str] or None, default None¶
User provided collection of metadata keys to exclude when processing metadata.
- Returns:¶
Output class containing the binned metadata
- Return type:¶