dataeval.protocols.MetadataLike

class dataeval.protocols.MetadataLike

Minimal protocol for metadata objects used in bias and quality analysis.

This protocol defines the minimum interface required for metadata objects to be used with DataEval’s bias evaluators (Balance, Diversity, Parity) and quality evaluators (Outliers). Users can implement lightweight custom metadata containers that satisfy this protocol.

index2label

Optional mapping from class label indices to human-readable names.

Type:

NotRequired[Mapping[int, str]]

item_indices

Optional array mapping each label back to its source item/image. If not provided, a 1:1 mapping is assumed (one label per image).

Type:

NotRequired[NDArray[np.intp]]

Example

Creating a simple metadata container:

>>> import numpy as np
>>> from dataeval.protocols import MetadataLike
>>>
>>> class SimpleMetadata(MetadataLike):
...     def __init__(self, factors, labels, names, discrete):
...         self._factors = factors
...         self._labels = labels
...         self._names = names
...         self._discrete = discrete
...
...     @property
...     def factor_names(self):
...         return self._names
...
...     @property
...     def factor_data(self):
...         return self._factors
...
...     @property
...     def class_labels(self):
...         return self._labels
...
...     @property
...     def is_discrete(self):
...         return self._discrete
>>>
>>> meta = SimpleMetadata(
...     factors=np.array([[0, 1], [1, 0], [0, 1]]),
...     labels=np.array([0, 1, 0]),
...     names=["age_bin", "gender"],
...     discrete=[True, True],
... )
>>> isinstance(meta, MetadataLike)
True
property class_labels : numpy.typing.NDArray[numpy.intp]

Flat array of class labels with one entry per target/detection.

For image classification, length equals number of images. For object detection, length equals total detections across all images.

property factor_data : numpy.typing.NDArray[numpy.int64]

Metadata factors in array of shape (n_samples, n_factors).

Continuous factors or non-integer data should be preprocessed into discrete integer bins before being returned here.

property factor_names : SequenceLike[str]

Names of the metadata factors.

property is_discrete : collections.abc.Sequence[bool]

Whether each factor is discrete (True) or continuous (False).

Must have the same length as factor_names.