dataeval.protocols.Metadata

class dataeval.protocols.Metadata

Minimal protocol for metadata objects used in bias and quality analysis.

This protocol defines the minimum interface required for metadata objects to be used with DataEval’s bias evaluators (Balance, Diversity, Parity) and quality evaluators (Outliers). Users can implement lightweight custom metadata containers that satisfy this protocol.

factor_names

Names of the metadata factors.

Type:

Sequence[str]

factor_data

Metadata factors in array of shape (n_samples, n_factors). Continuous factors or non-integer data should be preprocessed into discrete integer bins before being returned here.

Type:

NDArray[np.int64]

class_labels

Flat array of class labels with one entry per target/detection. For image classification, length equals number of images. For object detection, length equals total detections across all images.

Type:

NDArray[np.intp]

is_discrete

Whether each factor is discrete (True) or continuous (False). Must have the same length as factor_names.

Type:

Sequence[bool]

index2label

Optional mapping from class label indices to human-readable names.

Type:

NotRequired[Mapping[int, str]]

item_indices

Optional array mapping each label back to its source item/image. If not provided, a 1:1 mapping is assumed (one label per image).

Type:

NotRequired[NDArray[np.intp]]

Example

Creating a simple metadata container:

>>> import numpy as np
>>> from dataeval.protocols import Metadata
>>>
>>> class SimpleMetadata(Metadata):
...     def __init__(self, factors, labels, names, discrete):
...         self._factors = factors
...         self._labels = labels
...         self._names = names
...         self._discrete = discrete
...
...     @property
...     def factor_names(self):
...         return self._names
...
...     @property
...     def factor_data(self):
...         return self._factors
...
...     @property
...     def class_labels(self):
...         return self._labels
...
...     @property
...     def is_discrete(self):
...         return self._discrete
>>>
>>> meta = SimpleMetadata(
...     factors=np.array([[0, 1], [1, 0], [0, 1]]),
...     labels=np.array([0, 1, 0]),
...     names=["age_bin", "gender"],
...     discrete=[True, True],
... )
>>> isinstance(meta, Metadata)
True