How to wrap a DataFrame-backed object detection dataset¶
Problem statement¶
Object detection catalogues are commonly stored as tabular data, but unlike image classification an image holds many bounding boxes. The natural way to express this in a table is the long (tidy) format: one row per box, where rows that share an image identifier belong to the same image. This is what most annotation-tool CSV exports and flattened COCO/Pascal-VOC catalogues look like.
DataEval does not require any particular dataset class. Its evaluators consume
any object that satisfies the AnnotatedDataset protocol - a minimal
interface of __len__, __getitem__, and a metadata property. For object
detection, __getitem__ returns an ObjectDetectionTarget (bounding
boxes, labels, and scores) instead of a single class label.
This guide shows you how to wrap a long-format DataFrame in an object detection dataset that DataEval can analyze, grouping the per-box rows into one target per image.
When to use¶
Wrap a DataFrame when your images and their bounding-box annotations are described by a tabular catalog and you want to run DataEval analyses without first exporting to a directory layout or a built-in dataset format.
What you will need¶
A tabular catalog of your annotations, one row per box (here, a pandas
DataFrame)A Python environment with the following packages installed:
dataevalpandaspillow
Getting started¶
First import the required libraries needed to set up the example.
import tempfile
from dataclasses import dataclass
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image
from dataeval import Metadata
from dataeval.protocols import AnnotatedDataset, DatasetMetadata, DatumMetadata, ObjectDetectionTarget
Build an example catalog¶
In a real project your catalog would already exist and point at images on disk.
To keep this guide self-contained, you will write a small handful of images to a
temporary directory and describe their boxes in a DataFrame.
The DataFrame has one row per bounding box and these columns:
filepath- where the image lives on disk (repeated for each box in the image)x,y,w,h- the box in COCO-style top-left + width/height pixelslabel- the integer class index for the boxweather,altitude_m- image-level factors (constant across an image’s boxes)occlusion- a box-level factor that varies from one box to the next
data_dir = Path(tempfile.mkdtemp())
rng = np.random.default_rng(0)
index2label = {0: "person", 1: "car", 2: "bicycle"}
weather_options = ["clear", "rainy", "foggy"]
occlusion_levels = ["none", "partial", "heavy"]
rows = []
# Creating 30 128x128 images
for i in range(30):
# Stand-in for a real image file - replace with your own images on disk
pixels = rng.integers(0, 256, size=(128, 128, 3), dtype=np.uint8)
filepath = data_dir / f"img_{i:03d}.png"
Image.fromarray(pixels).save(filepath)
# A variable number of boxes per image is exactly why OD catalogs use one row
# per box rather than one row per image.
n_boxes = int(rng.integers(1, 5))
for _ in range(n_boxes):
x, y = rng.integers(0, 80, size=2)
w, h = rng.integers(16, 40, size=2)
rows.append({
"filepath": str(filepath),
"x": int(x),
"y": int(y),
"w": int(w),
"h": int(h),
"label": int(rng.integers(0, 3)),
"weather": weather_options[i % 3], # image-level factor (same for every box)
"altitude_m": float(50 + i), # image-level factor (same for every box)
"occlusion": occlusion_levels[int(rng.integers(0, 3))], # box-level factor (per box)
})
catalog = pd.DataFrame(rows)
catalog.head()
| filepath | x | y | w | h | label | weather | altitude_m | occlusion | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | /tmp/tmp7a9te4wn/img_000.png | 55 | 54 | 37 | 37 | 1 | clear | 50.0 | partial |
| 1 | /tmp/tmp7a9te4wn/img_000.png | 34 | 42 | 17 | 16 | 0 | clear | 50.0 | heavy |
| 2 | /tmp/tmp7a9te4wn/img_000.png | 23 | 11 | 35 | 24 | 2 | clear | 50.0 | none |
| 3 | /tmp/tmp7a9te4wn/img_000.png | 1 | 28 | 32 | 38 | 0 | clear | 50.0 | none |
| 4 | /tmp/tmp7a9te4wn/img_001.png | 44 | 15 | 30 | 20 | 0 | rainy | 51.0 | partial |
Write the adapter¶
DataEval’s evaluators only need three things from a dataset, which together make
up the AnnotatedDataset protocol:
__len__()- the number of items, which for object detection is the number of unique images, not the number of rows__getitem__(index)- an(image, target, datum_metadata)tuple for one imageimage- an array of shape(C, H, W)target- anObjectDetectionTarget:boxesof shape(N, 4)in(x0, y0, x1, y1)format,labelsof shape(N,), andscoresof shape(N,)datum_metadata- adictof per-item metadata, which must contain anid
a
metadataproperty - aDatasetMetadatadescribing the dataset as a whole, including theindex2labelclass-name mapping
ObjectDetectionTarget is a runtime-checkable protocol, so any object that
exposes boxes, labels, and scores qualifies. A small dataclass is enough.
@dataclass
class BoxTarget:
"""A minimal object detection target.
Implements the ObjectDetectionTarget protocol by exposing the three
attributes DataEval reads.
"""
boxes: np.ndarray # (N, 4) bounding boxes in x0, y0, x1, y1 pixel format
labels: np.ndarray # (N,) integer class index per box
scores: np.ndarray # (N,) confidence per box; 1.0 for ground truth
The adapter groups the catalog by image once up front so that positional
indexing in __getitem__ maps to a stable list of images. Each lookup decodes
the image, converts that image’s boxes to the format DataEval expects, and
assembles the target.
class DataFrameODDataset:
"""An object detection dataset backed by a long-format pandas DataFrame.
Each row describes one bounding box. Rows that share the image-path column
belong to the same image and are grouped into a single target. Metadata is
surfaced at two levels: ``metadata_cols`` are image-level (one value per image)
and ``box_metadata_cols`` are box-level (one value per box).
"""
def __init__(
self,
dataframe: pd.DataFrame,
index2label: dict[int, str],
image_col: str = "filepath",
box_cols: tuple[str, str, str, str] = ("x", "y", "w", "h"),
label_col: str = "label",
metadata_cols: list[str] | None = None,
box_metadata_cols: list[str] | None = None,
dataset_id: str = "dataframe-od-catalog",
) -> None:
self._df = dataframe.reset_index(drop=True)
self.index2label = index2label
self._box_cols = box_cols
self._label_col = label_col
self._metadata_cols = metadata_cols or []
self._box_metadata_cols = box_metadata_cols or []
# Group rows by image, preserving first-seen order, so __getitem__(i)
# always refers to the same image. Each group is a (key, sub-frame) pair.
self._groups = [rows for _, rows in self._df.groupby(image_col, sort=False)]
self._image_col = image_col
# DatasetMetadata advertises the class-name mapping to DataEval
self.metadata: DatasetMetadata = DatasetMetadata(id=dataset_id, index2label=index2label)
def __len__(self) -> int:
return len(self._groups)
def __getitem__(self, index: int) -> tuple[np.ndarray, BoxTarget, DatumMetadata]:
rows = self._groups[index]
first = rows.iloc[0]
# Decode the image and convert to channels-first (C, H, W)
image = np.asarray(Image.open(first[self._image_col]).convert("RGB"), dtype=np.uint8).transpose(2, 0, 1)
# Convert COCO-style (x, y, w, h) boxes to the (x0, y0, x1, y1) format
# ObjectDetectionTarget expects.
x, y, w, h = (rows[col].to_numpy(dtype=np.float32) for col in self._box_cols)
boxes = np.stack([x, y, x + w, y + h], axis=1)
labels = rows[self._label_col].to_numpy(dtype=np.intp)
# Ground-truth boxes are certain, so every score is 1.0
scores = np.ones(len(labels), dtype=np.float32)
target = BoxTarget(boxes=boxes, labels=labels, scores=scores)
# Image-level metadata is the same for every box, so take it from the first
# row as a scalar. Box-level metadata is one value per box, passed as a list.
# DataEval broadcasts the scalars across the image's detections and expands
# the lists to one value per detection. Per-item metadata must include an "id".
datum_metadata = DatumMetadata(
id=index,
**{col: first[col] for col in self._metadata_cols},
**{col: rows[col].tolist() for col in self._box_metadata_cols},
)
return image, target, datum_metadata
Instantiate the adapter over your catalog, pointing it at the metadata columns you want DataEval to see.
dataset = DataFrameODDataset(
catalog,
index2label,
metadata_cols=["weather", "altitude_m"], # image-level
box_metadata_cols=["occlusion"], # box-level
)
print(f"Catalog rows (boxes): {len(catalog)}")
print(f"Dataset length (images): {len(dataset)}")
Catalog rows (boxes): 72
Dataset length (images): 30
Inspect a single item to confirm the shapes and types match what DataEval
expects. Verify that images have channels first, that targets have
boxes (N, 4), labels (N,), and scores (N,), and that datum metadata holds an
id, the image-level factors as scalars, and the box-level occlusion as a list
with one entry per box.
image, target, datum_metadata = dataset[0]
print(f"image shape: {image.shape} ({image.dtype})")
print(f"boxes shape: {target.boxes.shape} (x0, y0, x1, y1)")
print(f"labels: {target.labels}")
print(f"scores: {target.scores}")
print(f"datum metadata: {datum_metadata}")
image shape: (3, 128, 128) (uint8)
boxes shape: (4, 4) (x0, y0, x1, y1)
labels: [1 0 2 0]
scores: [1. 1. 1. 1.]
datum metadata: {'id': 0, 'weather': 'clear', 'altitude_m': np.float64(50.0), 'occlusion': ['partial', 'heavy', 'none', 'none']}
Because the protocols are runtime-checkable, you can verify structurally that both the dataset and its targets are valid for DataEval.
print(f"Is an AnnotatedDataset: {isinstance(dataset, AnnotatedDataset)}")
print(f"Target is ObjectDetectionTarget: {isinstance(target, ObjectDetectionTarget)}")
Is an AnnotatedDataset: True
Target is ObjectDetectionTarget: True
Analyze it with DataEval¶
The adapter now works anywhere a DataEval object detection dataset is expected.
Build a Metadata object from it - DataEval reads the per-box labels and
both the image-level and box-level metadata you exposed.
metadata = Metadata(dataset)
# "id" is a per-item identifier, not a meaningful factor for bias analysis
metadata.exclude = ["id"]
print(f"Factor names: {metadata.factor_names}")
Factor names: ['altitude_m', 'occlusion', 'weather']
Image-level and box-level factors¶
Object detection metadata lives at two levels, and DataEval models every detection as its own row. Each factor is one of:
Image-level - one value per image (e.g.
weather,altitude_m). You pass it as a scalar and DataEval broadcasts it to every box in the image.Box-level - one value per box (e.g.
occlusion). You pass it as a list the length of the image’s boxes and DataEval keeps one value per detection.
That split is why you give the adapter metadata_cols and box_metadata_cols
separately. Metadata lays the dataset out one row per detection, so you
can see the difference directly: within a single image weather repeats while
occlusion changes from box to box.
print(metadata.target_data.select(["item_index", "target_index", "class_label", "weather", "occlusion"]).head(8))
shape: (8, 5)
┌────────────┬──────────────┬─────────────┬─────────┬───────────┐
│ item_index ┆ target_index ┆ class_label ┆ weather ┆ occlusion │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ str │
╞════════════╪══════════════╪═════════════╪═════════╪═══════════╡
│ 0 ┆ 0 ┆ 1 ┆ clear ┆ partial │
│ 0 ┆ 1 ┆ 0 ┆ clear ┆ heavy │
│ 0 ┆ 2 ┆ 2 ┆ clear ┆ none │
│ 0 ┆ 3 ┆ 0 ┆ clear ┆ none │
│ 1 ┆ 0 ┆ 0 ┆ rainy ┆ partial │
│ 1 ┆ 1 ┆ 0 ┆ rainy ┆ partial │
│ 1 ┆ 2 ┆ 1 ┆ rainy ┆ partial │
│ 1 ┆ 3 ┆ 2 ┆ rainy ┆ heavy │
└────────────┴──────────────┴─────────────┴─────────┴───────────┘
That is all it takes: a small adapter that groups per-box rows - and tells DataEval which factors are image-level and which are box-level - turns any DataFrame-described object detection catalog into something DataEval can analyze, with no need to restructure your files on disk.
Adapting to your data¶
Box format.
ObjectDetectionTargetuses pixel(x0, y0, x1, y1). If your catalog already stores that, drop the conversion and read the four columns straight through. For other layouts -xywh,cxcywh, or YOLO-normalized -BoundingBoxconverts toxyxyfor you and handles the YOLO image-size scaling, e.g.BoundingBox(*row, bbox_format=BoundingBoxFormat.YOLO, image_shape=image.shape).xyxy.Nested boxes. If each row is one image with a list or JSON column of boxes instead of one row per box, only the row handling changes: parse that cell in
__getitem__rather than grouping the DataFrame.