dataeval.core.label_stats

dataeval.core.label_stats(labels, index2label=None)

Calculates statistics for data labels.

This function computes counting metrics (e.g., total per class, total per image) on the labels. This is a core computation function that operates on basic data structures without dependencies on complex domain objects.

Parameters:
labels : Iterable[int] | Iterable[Iterable[int]]

A sequence of label sequences, where each inner sequence contains the integer labels for a single image. For image classification, each inner sequence typically contains a single label. For object detection, each inner sequence contains multiple labels (one per detected object). Empty sequences represent images with no labels/detections.

index2label : Mapping[int, str] | None, optional

A mapping from label integers to class names. If None, class names will be generated as string representations of the label integers.

Returns:

A mapping containing the computed counting metrics for the labels with keys:

  • label_counts_per_class: Mapping[int, int] - Total count of each class

  • label_counts_per_image: Sequence[int] - Number of labels per image

  • image_counts_per_class: Mapping[int, int] - How many images contain each label

  • image_indices_per_class: Mapping[int, Sequence[int]] - Which images contain each label

  • image_count: int - Total number of images

  • class_count: int - Total number of classes

  • label_count: int - Total number of labels

  • index2label: Mapping[int, str] - Direct mapping from class index to class name

  • empty_image_indices: Sequence[int] - Indices of images with no labels

  • empty_image_count: int - Number of images with no labels

Return type:

LabelStatsResult

Examples

Calculate basic statistics on labels for object detection.

>>> labels = [[0, 0, 1], [1, 2], [], [0, 1, 2, 3]]
>>> index2label = {0: "horse", 1: "cow", 2: "sheep", 3: "pig"}
>>> stats = label_stats(labels, index2label)
>>> stats["label_counts_per_class"]
{0: 3, 1: 3, 2: 2, 3: 1}
>>> stats["label_counts_per_image"]
[3, 2, 0, 4]
>>> stats["empty_image_indices"]
[2]
>>> stats["empty_image_count"]
1

Calculate basic statistics on labels for image classification.

>>> labels = [[0], [1], [2], [0]]
>>> index2label = {0: "cat", 1: "dog", 2: "bird"}
>>> stats = label_stats(labels, index2label)
>>> stats["label_counts_per_class"]
{0: 2, 1: 1, 2: 1}
>>> stats["label_counts_per_image"]
[1, 1, 1, 1]
>>> stats["empty_image_count"]
0