dataeval.data.Embeddings¶

class dataeval.data.Embeddings(dataset, batch_size, transforms=None, model=None, device=None, cache=False, verbose=False)¶

Collection of image embeddings from a dataset.

Embeddings are accessed by index or slice and are only loaded on-demand.

Parameters:¶

dataset : ImageClassificationDataset or ObjectDetectionDataset¶: Dataset to access original images from.
batch_size : int¶: Batch size to use when encoding images.
transforms : Transform or Sequence[Transform] or None, default None¶: Transforms to apply to images before encoding.
model : torch.nn.Module or None, default None¶: Model to use for encoding images.
device : DeviceLike or None, default None¶: The hardware device to use if specified, otherwise uses the DataEval default or torch default.
cache : Path, str, or bool, default False¶: Whether to cache the embeddings to a file or in memory. When a Path or string is provided, embeddings will be cached to disk.
verbose : bool, default False¶: Whether to print progress bar when encoding images.

batch_size¶

Batch size to use when encoding images.

Type:¶: int

cache¶

The path to cache embeddings to file, or True if caching to memory.

Type:¶: Path or bool
Return type:¶: pathlib.Path | bool

device¶

The hardware device to use if specified, otherwise uses the DataEval default or torch default.

Type:¶: torch.device

verbose¶

Whether to print progress bar when encoding images.

Type:¶: bool

classmethod from_array(array, device=None)¶

Instantiates a shallow Embeddings object using an array.

Parameters:¶

array : ArrayLike¶: The array to convert to embeddings.
device : DeviceLike or None, default None¶: The hardware device to use if specified, otherwise uses the DataEval default or torch default.

Return type:¶

Embeddings

Example

>>> import numpy as np
>>> from dataeval.data import Embeddings
>>> array = np.random.randn(100, 3, 224, 224)
>>> embeddings = Embeddings.from_array(array)
>>> print(embeddings.to_tensor().shape)
torch.Size([100, 3, 224, 224])

classmethod load(path)¶

Loads the embeddings from disk.

Parameters:¶

path : Path or str¶: The file path to load the embeddings from.

Return type:¶

Embeddings

new(dataset)¶

Creates a new Embeddings object with the same parameters but a different dataset.

Parameters:¶

dataset : ImageClassificationDataset or ObjectDetectionDataset¶: Dataset to access original images from.

Return type:¶

Embeddings

save(path)¶

Saves the embeddings to disk.

Parameters:¶

path : Path or str¶: The file path to save the embeddings to.

Return type:¶

None

to_numpy(indices=None)¶

Converts dataset to embeddings as numpy array.

Parameters:¶

indices : Sequence[int] or None, default None¶: The indices to convert to embeddings

Return type:¶

NDArray[Any]

Warning

Processing large quantities of data can be resource intensive.

to_tensor(indices=None)¶

Converts dataset to embeddings.

Parameters:¶

indices : Sequence[int] or None, default None¶: The indices to convert to embeddings

Return type:¶

torch.Tensor

Warning

Processing large quantities of data can be resource intensive.