dataeval.data.Select¶
-
class dataeval.data.Select(dataset, selections=
None)¶ Dataset wrapper that applies selection criteria for filtering.
Wraps an existing dataset and applies one or more selection filters to create a subset view without modifying the original dataset. Supports chaining multiple selection criteria for complex filtering operations.
- Parameters:¶
- dataset : AnnotatedDataset[_TDatum]¶
Source dataset to wrap and filter. Must implement AnnotatedDataset interface with indexed access to data tuples.
- selections : Selection or Sequence[Selection] or None, default None¶
Selection criteria to apply for filtering the dataset. When None, returns all items from the source dataset. Default None creates unfiltered view for consistent interface.
Examples
>>> from dataeval.data.selections import ClassFilter, Limit>>> # Construct a sample dataset with size of 100 and class count of 10 >>> # Elements at index `idx` are returned as tuples: >>> # - f"data_{idx}", one_hot_encoded(idx % class_count), {"id": idx} >>> dataset = SampleDataset(size=100, class_count=10)>>> # Apply selection criteria to the dataset >>> selections = [Limit(size=5), ClassFilter(classes=[0, 2])] >>> selected_dataset = Select(dataset, selections=selections)>>> # Iterate over the selected dataset >>> for data, target, meta in selected_dataset: ... print(f"({data}, {np.argmax(target)}, {meta})") (data_0, 0, {'id': 0}) (data_2, 2, {'id': 2}) (data_10, 0, {'id': 10}) (data_12, 2, {'id': 12}) (data_20, 0, {'id': 20})Notes
Selection criteria are applied in the order provided, allowing for efficient sequential filtering. The wrapper maintains all metadata and interface compatibility with the original dataset.
- resolve_indices()¶
Return the list of dataset indices after all selections have been applied.
- property metadata : dataeval.typing.DatasetMetadata¶
Dataset metadata information including identifier and configuration.
- Return type:¶