dataeval.extractors.OnnxExtractor

class dataeval.extractors.OnnxExtractor(model, transforms=None, output_name=None, flatten=True)

Extracts embeddings via ONNX Runtime with lazy model loading.

Encapsulates ONNX-specific logic for feature extraction:

  • Model loading from ONNX files or in-memory bytes

  • Automatic GPU/CPU provider selection with fallback

  • Transform pipeline

  • Output layer selection for multi-output models

Implements the FeatureExtractor protocol.

Parameters:
model : str, Path, or bytes

Path to the ONNX model file, or serialized model bytes from to_encoding_model().

transforms : Transform or Sequence[Transform] or None, default None

Preprocessing transforms to apply before encoding. When None, uses raw images.

output_name : str or None, default None

Name of the output to extract embeddings from. When None, uses the first output. Required for models with multiple outputs.

flatten : bool, default True

If True, flattens outputs with more than 2 dimensions to (N, D) shape. If False, preserves the original output shape.

Example

Basic usage with a model file:

>>> from dataeval import Embeddings
>>> from dataeval.extractors import OnnxExtractor
>>>
>>> extractor = OnnxExtractor("model.onnx")
>>> embeddings = Embeddings(dataset, extractor=extractor, batch_size=32)

Notes

  • The extractor expects images in CHW format (channels, height, width).

  • For models with multiple outputs, use output_name to specify which output contains embeddings.

  • The model is loaded lazily on first use.

  • Requires onnxruntime or onnxruntime-gpu to be installed.

property output_name : str | None

Return the output name for extraction, if set.