dataeval.data.Select

class dataeval.data.Select(dataset, selections=None)

Dataset wrapper that applies selection criteria for filtering.

Wraps an existing dataset and applies one or more selection filters to create a subset view without modifying the original dataset. Supports chaining multiple selection criteria for complex filtering operations.

Parameters:
dataset : AnnotatedDataset[_TDatum]

Source dataset to wrap and filter. Must implement AnnotatedDataset interface with indexed access to data tuples.

selections : Selection or Sequence[Selection] or None, default None

Selection criteria to apply for filtering the dataset. When None, returns all items from the source dataset. Default None creates unfiltered view for consistent interface.

Examples

>>> from dataeval.data.selections import ClassFilter, Limit
>>> # Construct a sample dataset with size of 100 and class count of 10
>>> # Elements at index `idx` are returned as tuples:
>>> # - f"data_{idx}", one_hot_encoded(idx % class_count), {"id": idx}
>>> dataset = SampleDataset(size=100, class_count=10)
>>> # Apply selection criteria to the dataset
>>> selections = [Limit(size=5), ClassFilter(classes=[0, 2])]
>>> selected_dataset = Select(dataset, selections=selections)
>>> # Iterate over the selected dataset
>>> for data, target, meta in selected_dataset:
...     print(f"({data}, {np.argmax(target)}, {meta})")
(data_0, 0, {'id': 0})
(data_2, 2, {'id': 2})
(data_10, 0, {'id': 10})
(data_12, 2, {'id': 12})
(data_20, 0, {'id': 20})

Notes

Selection criteria are applied in the order provided, allowing for efficient sequential filtering. The wrapper maintains all metadata and interface compatibility with the original dataset.

resolve_indices()

Return the list of dataset indices after all selections have been applied.

Returns:

The list of selected indices from the original dataset.

Return type:

list[int]

property metadata : dataeval.typing.DatasetMetadata

Dataset metadata information including identifier and configuration.

Return type:

dataeval.typing.DatasetMetadata