dataeval.encoders.TorchEmbeddingEncoder¶
-
class dataeval.encoders.TorchEmbeddingEncoder(model, batch_size=
None, transforms=None, device=None, layer_name=None, use_output=True)¶ PyTorch-based embedding encoder.
Encapsulates all PyTorch-specific logic for embedding extraction: - Model management (torch.nn.Module) - Device handling - Transform pipeline - Batch processing via DataLoader - Layer hooking for intermediate layer extraction
- Parameters:¶
- model : torch.nn.Module¶
PyTorch model for embedding extraction.
- batch_size : int or None, default None¶
Number of samples per batch. When None, uses DataEval’s configured batch size.
- transforms : Transform or Sequence[Transform] or None, default None¶
Preprocessing transforms to apply before encoding. When None, uses raw images.
- device : DeviceLike or None, default None¶
Device for computation. When None, uses DataEval’s configured device.
- layer_name : str or None, default None¶
Layer to extract embeddings from. When None, uses model output.
- use_output : bool, default True¶
If True, captures layer output; if False, captures layer input. Only used when layer_name is specified.
Example
Basic usage with a model:
>>> import torch.nn as nn >>> from dataeval.encoders import TorchEmbeddingEncoder >>> from dataeval import Embeddings >>> >>> model = nn.Sequential(nn.Flatten(), nn.Linear(784, 128)) >>> encoder = TorchEmbeddingEncoder(model, batch_size=32, device="cpu") >>> embeddings = Embeddings(dataset, encoder=encoder)Extracting from an intermediate layer:
>>> encoder = TorchEmbeddingEncoder( ... model, ... batch_size=32, ... layer_name="0", # Extract from Flatten layer ... use_output=True, ... )- encode(dataset: dataeval.protocols.Dataset[tuple[dataeval.protocols.ArrayLike, Any, Any]] | dataeval.protocols.Dataset[dataeval.protocols.ArrayLike], indices: collections.abc.Sequence[int], stream: True) collections.abc.Iterator[tuple[collections.abc.Sequence[int], numpy.typing.NDArray[Any]]]¶
-
encode(dataset: dataeval.protocols.Dataset[tuple[dataeval.protocols.ArrayLike, Any, Any]] | dataeval.protocols.Dataset[dataeval.protocols.ArrayLike], indices: collections.abc.Sequence[int], stream: False =
...) numpy.typing.NDArray[Any] Encode images at specified indices to embeddings.
- property batch_size : int¶
Return the batch size used for encoding.
- property layer_name : str | None¶
Return the layer name for intermediate extraction, if set.
- property use_output : bool¶
Return whether output (True) or input (False) is captured from the layer.