dataeval.protocols.Chunker

class dataeval.protocols.Chunker

Protocol for chunking datasets into subsets by returning index arrays.

Implementations must provide a __call__ method that takes the number of samples and returns a list of index arrays representing the chunks.

Examples

Creating a simple chunker that splits the dataset into equal parts:

>>> import numpy as np
>>> from dataeval.protocols import Chunker
>>>
>>> class EqualChunker:
...     def __init__(self, n_chunks: int):
...         self.n_chunks = n_chunks
...
...     def __call__(self, n: int) -> list[NDArray[np.intp]]:
...         return [idx.astype(np.intp) for idx in np.array_split(np.arange(n), self.n_chunks)]
>>>
>>> chunker = EqualChunker(n_chunks=5)
>>> isinstance(chunker, Chunker)
True