dataeval.extractors.OnnxExtractor¶

class dataeval.extractors.OnnxExtractor(model, transforms=None, output_name=None, flatten=True, batch_size=None, image_size=None)¶

Extracts embeddings via ONNX Runtime with lazy model loading.

Encapsulates ONNX-specific logic for feature extraction:

Model loading from ONNX files or in-memory bytes
Automatic GPU/CPU provider selection with fallback
Transform pipeline
Output layer selection for multi-output models

Implements the FeatureExtractor protocol.

Parameters:¶

model : str, Path, or bytes¶: Path to the ONNX model file, or serialized model bytes from to_encoding_model().
transforms : Transform or Sequence[Transform] or None, default None¶: Preprocessing transforms to apply before encoding. When None, uses raw images.
output_name : str or None, default None¶: Name of the output to extract embeddings from. When None, uses the first output. Required for models with multiple outputs.
flatten : bool, default True¶: If True, flattens outputs with more than 2 dimensions to (N, D) shape. If False, preserves the original output shape.
batch_size : int or None, default None¶: Forward-pass (compute) batch size: how many images go through the model at once. None runs a single inference pass over all inputs. When wrapped by Embeddings, Embeddings loads images in its own (I/O) chunks and this extractor sub-batches each chunk by this value, so the smaller of the two bounds the forward pass.
image_size : tuple[int, int] or None, default None¶: Optional user-imposed model input size as (height, width). When set, each CHW image is bilinearly resized to this size before inference, overriding the model’s native expected input size. The configured size is preferred over any size declared in a model metadata file. None leaves images at their incoming size.

Example

Basic usage with a model file:

>>> from dataeval import Embeddings
>>> from dataeval.extractors import OnnxExtractor
>>>
>>> extractor = OnnxExtractor("model.onnx")
>>> embeddings = Embeddings(dataset, extractor=extractor, batch_size=32)

Notes

The extractor expects images in CHW format (channels, height, width).
For models with multiple outputs, use output_name to specify which output contains embeddings.
The model is loaded lazily on first use.
Requires onnxruntime or onnxruntime-gpu to be installed.

property batch_size : int | None¶

Return the default batch size for inference, if set.

property output_name : str | None¶

Return the output name for extraction, if set.