dataeval.extractors.TorchExtractor¶
-
class dataeval.extractors.TorchExtractor(model, transforms=
None, device=None, layer_name=None, use_output=True, flatten=True)¶ Extracts embeddings from a PyTorch model, with optional intermediate layer hooking.
Encapsulates all PyTorch-specific logic for feature extraction:
Model management (torch.nn.Module)
Device handling
Transform pipeline
Layer hooking for intermediate layer extraction
Implements the
FeatureExtractorprotocol.- Parameters:¶
- model : torch.nn.Module¶
PyTorch model for feature extraction.
- transforms : Transform or Sequence[Transform] or None, default None¶
Preprocessing transforms to apply before encoding. When None, uses raw images.
- device : DeviceLike or None, default None¶
Device for computation. When None, uses DataEval’s configured device.
- layer_name : str or None, default None¶
Layer to extract embeddings from. When None, uses model output.
- use_output : bool, default True¶
If True, captures layer output; if False, captures layer input. Only used when layer_name is specified.
- flatten : bool, default True¶
If True, flattens outputs with more than 2 dimensions to (N, D) shape. If False, preserves the original output shape.
Example
Basic usage with a model:
>>> import torch.nn as nn >>> from dataeval import Embeddings >>> from dataeval.extractors import TorchExtractor >>> >>> model = nn.Sequential(nn.Flatten(), nn.Linear(784, 128)) >>> extractor = TorchExtractor(model, device="cpu") >>> embeddings = Embeddings(dataset, extractor=extractor, batch_size=32)Extracting from an intermediate layer:
>>> extractor = TorchExtractor( ... model, ... layer_name="0", # Extract from Flatten layer ... use_output=True, ... )- property flatten : bool¶
Return whether outputs are flattened to 2D.
- property layer_name : str | None¶
Return the layer name for intermediate extraction, if set.
- property use_output : bool¶
Return whether output (True) or input (False) is captured from the layer.