dataeval.data.DetectionCrops

class dataeval.data.DetectionCrops(dataset, *, region='object', padding=0.0, min_size=1, square='expand', fill='mean')

Present an object-detection dataset’s ground-truth boxes as an image-classification dataset.

Each kept detection becomes one classification datum — a crop derived from the detection’s box, labeled (one-hot) with the detection’s class. The view satisfies the ImageClassificationDataset shape, so it drops into Embeddings, Coverage, ber_mst(), and Balance — every per-(image, label) tool — unchanged, with crops aligned 1:1 to labels by construction.

This makes object-detection feasibility (the bounding-box-classification reduction behind Upper-Bound Average Precision (UAP)) and embedding-space coverage available to object-detection datasets without computing detection-level embeddings by hand. Crops are produced lazily on access, so an extractor’s transforms still handle resize/normalize and Embeddings still batches and caches.

Parameters:
dataset : ObjectDetectionDataset

The source object-detection dataset. Each datum is a MAITE (image, ObjectDetectionTarget, metadata) 3-tuple; images are read in (C, H, W) layout and boxes in absolute-pixel [x0, y0, x1, y1] format.

region : {"object", "context", "surround"}, default "object"

Which pixels each crop retains. "object" and "context" both return the box widened by padding ("context" is the conventional name when padding is large enough to bring in surroundings); "surround" returns the widened box with the original box masked to fill, leaving only the background ring — a probe for whether the surroundings alone predict the class. "surround" requires padding > 0. Only "object" is exercised by the shipped tutorials.

padding : float, default 0.0

Context margin added to each box, as a fraction of that box dimension: each side is extended by padding times the box’s width (left/right) or height (top/bottom). 0.1 grows a 100x200 box to 120x240, centered. Must be >= 0.

min_size : int, default 1

Drop detections whose box’s shorter side is below this many pixels (degenerate or zero-area boxes are always dropped). The number dropped is logged and exposed as n_dropped; because dropping shrinks the view, len(crops) may be less than the source’s total detection count.

square : {"off", "expand", "pad"}, default "expand"

How a non-square crop is reconciled with a square model input. "expand" squares the crop by extending the shorter side into real image pixels (shifting the window inward at image edges, falling back to fill only for unavoidable overflow) — no synthetic fill for interior boxes, but it brings in real background, which for extreme aspect ratios can dilute thin objects. "pad" squares by padding the shorter side with synthetic fill, keeping the embedding object-focused (prefer this for strict feasibility / BER). "off" leaves crops rectangular for the extractor’s resize to stretch (the prior default behavior).

fill : {"mean", "zero"}, default "mean"

Value for invented pixels — used by square="pad", by edge overflow in square="expand", and to mask the object in region="surround". "mean" uses the per-crop, per-channel mean (normalization-agnostic, minimal contrast at the fill boundary); "zero" uses 0 (set this to your normalization mean if you need strict post-normalization neutrality).

n_dropped

Number of detections dropped by min_size (or for being degenerate).

Type:

int

index2label

Mapping from class index to name, inherited from the source dataset.

Type:

dict[int, str]

Notes

Each datum’s third element is its metadata — a plain dict at runtime, conforming to DatumMetadata — with the protocol-required id plus three keys added by this view that trace a crop back to its source detection:

  • id (int) — the crop’s own identifier: its position in this view, 0 to len(crops) - 1, aligned 1:1 with the labels and embeddings.

  • source_id (int | str) — the source datum’s own DatumMetadata id (not a positional index), so a crop flagged downstream (e.g. as low-dispersion or uncovered) still resolves to the correct image after the source has been filtered, sorted, or otherwise re-indexed by a view such as Select (which renumbers positions but passes each datum’s id through unchanged). Falls back to the positional index for source data that omits the protocol-required id.

  • target (int) — the detection’s index within its source image (its position in that image’s target arrays).

  • box (list[float]) — the detection’s absolute-pixel [x0, y0, x1, y1] in the source image.

Examples

Wrap an object-detection dataset and run the classification-only tools on it:

>>> from dataeval.data import DetectionCrops
>>> crops = DetectionCrops(od_dataset)
>>> emb = Embeddings(crops, extractor=extractor, batch_size=64)
>>> Coverage().evaluate(crops, embeddings=emb)  # per-class dispersion over OD classes
property metadata : dataeval.protocols.DatasetMetadata

MAITE dataset metadata for the crop view (id and inherited index2label).