dataeval.data.Embeddings

class dataeval.data.Embeddings(dataset, batch_size, transforms=None, model=None, device=None, cache=False, verbose=False)

Collection of image embeddings from a dataset.

Embeddings are accessed by index or slice and are only loaded on-demand.

Parameters:
dataset : ImageClassificationDataset or ObjectDetectionDataset

Dataset to access original images from.

batch_size : int

Batch size to use when encoding images.

transforms : Transform or Sequence[Transform] or None, default None

Transforms to apply to images before encoding.

model : torch.nn.Module or None, default None

Model to use for encoding images.

device : DeviceLike or None, default None

The hardware device to use if specified, otherwise uses the DataEval default or torch default.

cache : Path, str, or bool, default False

Whether to cache the embeddings to a file or in memory. When a Path or string is provided, embeddings will be cached to disk.

verbose : bool, default False

Whether to print progress bar when encoding images.

batch_size

Batch size to use when encoding images.

Type:

int

cache

The path to cache embeddings to file, or True if caching to memory.

Type:

Path or bool

Return type:

pathlib.Path | bool

device

The hardware device to use if specified, otherwise uses the DataEval default or torch default.

Type:

torch.device

verbose

Whether to print progress bar when encoding images.

Type:

bool

classmethod from_array(array, device=None)

Instantiates a shallow Embeddings object using an array.

Parameters:
array : ArrayLike

The array to convert to embeddings.

device : DeviceLike or None, default None

The hardware device to use if specified, otherwise uses the DataEval default or torch default.

Return type:

Embeddings

Example

>>> import numpy as np
>>> from dataeval.data import Embeddings
>>> array = np.random.randn(100, 3, 224, 224)
>>> embeddings = Embeddings.from_array(array)
>>> print(embeddings.to_tensor().shape)
torch.Size([100, 3, 224, 224])
classmethod load(path)

Loads the embeddings from disk.

Parameters:
path : Path or str

The file path to load the embeddings from.

Return type:

Embeddings

new(dataset)

Creates a new Embeddings object with the same parameters but a different dataset.

Parameters:
dataset : ImageClassificationDataset or ObjectDetectionDataset

Dataset to access original images from.

Return type:

Embeddings

save(path)

Saves the embeddings to disk.

Parameters:
path : Path or str

The file path to save the embeddings to.

Return type:

None

to_numpy(indices=None)

Converts dataset to embeddings as numpy array.

Parameters:
indices : Sequence[int] or None, default None

The indices to convert to embeddings

Return type:

NDArray[Any]

Warning

Processing large quantities of data can be resource intensive.

to_tensor(indices=None)

Converts dataset to embeddings.

Parameters:
indices : Sequence[int] or None, default None

The indices to convert to embeddings

Return type:

torch.Tensor

Warning

Processing large quantities of data can be resource intensive.