dataeval.extractors.OnnxExtractor¶
-
class dataeval.extractors.OnnxExtractor(model, transforms=
None, output_name=None, flatten=True, batch_size=None, image_size=None)¶ Extracts embeddings via ONNX Runtime with lazy model loading.
Encapsulates ONNX-specific logic for feature extraction:
Model loading from ONNX files or in-memory bytes
Automatic GPU/CPU provider selection with fallback
Transform pipeline
Output layer selection for multi-output models
Implements the
FeatureExtractorprotocol.- Parameters:¶
- model : str, Path, or bytes¶
Path to the ONNX model file, or serialized model bytes from
to_encoding_model().- transforms : Transform or Sequence[Transform] or None, default None¶
Preprocessing transforms to apply before encoding. When None, uses raw images.
- output_name : str or None, default None¶
Name of the output to extract embeddings from. When None, uses the first output. Required for models with multiple outputs.
- flatten : bool, default True¶
If True, flattens outputs with more than 2 dimensions to (N, D) shape. If False, preserves the original output shape.
- batch_size : int or None, default None¶
Forward-pass (compute) batch size: how many images go through the model at once.
Noneruns a single inference pass over all inputs. When wrapped byEmbeddings,Embeddingsloads images in its own (I/O) chunks and this extractor sub-batches each chunk by this value, so the smaller of the two bounds the forward pass.- image_size : tuple[int, int] or None, default None¶
Optional user-imposed model input size as
(height, width). When set, each CHW image is bilinearly resized to this size before inference, overriding the model’s native expected input size. The configured size is preferred over any size declared in a model metadata file.Noneleaves images at their incoming size.
Example
Basic usage with a model file:
>>> from dataeval import Embeddings >>> from dataeval.extractors import OnnxExtractor >>> >>> extractor = OnnxExtractor("model.onnx") >>> embeddings = Embeddings(dataset, extractor=extractor, batch_size=32)Notes
The extractor expects images in CHW format (channels, height, width).
For models with multiple outputs, use
output_nameto specify which output contains embeddings.The model is loaded lazily on first use.
Requires
onnxruntimeoronnxruntime-gputo be installed.