dataeval.encoders.OnnxEncoder¶
-
class dataeval.encoders.OnnxEncoder(model, batch_size=
None, transforms=None, output_name=None, flatten=True)¶ ONNX Runtime-based embedding encoder.
Encapsulates ONNX-specific logic for embedding extraction: - Model loading from ONNX files or in-memory bytes - Automatic GPU/CPU provider selection with fallback - Transform pipeline - Batch processing - Output layer selection for multi-output models
- Parameters:¶
- model : str, Path, or bytes¶
Path to the ONNX model file, or serialized model bytes from
to_encoding_model().- batch_size : int or None, default None¶
Number of samples per batch. When None, uses DataEval’s configured batch size.
- transforms : Transform or Sequence[Transform] or None, default None¶
Preprocessing transforms to apply before encoding. When None, uses raw images.
- output_name : str or None, default None¶
Name of the output to extract embeddings from. When None, uses the first output. Required for models with multiple outputs.
- flatten : bool, default True¶
If True, flattens outputs with more than 2 dimensions to (N, D) shape. If False, preserves the original output shape.
Example
Basic usage with a model file:
>>> from dataeval.encoders import OnnxEncoder >>> from dataeval import Embeddings >>> >>> encoder = OnnxEncoder("model.onnx", batch_size=32) >>> embeddings = Embeddings(dataset, encoder=encoder)Notes
The encoder expects images in CHW format (channels, height, width).
For models with multiple outputs, use
output_nameto specify which output contains embeddings.The model is loaded lazily on first use.
Requires
onnxruntimeoronnxruntime-gputo be installed.
- encode(dataset: dataeval.protocols.Dataset[tuple[dataeval.protocols.ArrayLike, Any, Any]] | dataeval.protocols.Dataset[dataeval.protocols.ArrayLike], indices: collections.abc.Sequence[int], stream: True) collections.abc.Iterator[tuple[collections.abc.Sequence[int], numpy.typing.NDArray[Any]]]¶
-
encode(dataset: dataeval.protocols.Dataset[tuple[dataeval.protocols.ArrayLike, Any, Any]] | dataeval.protocols.Dataset[dataeval.protocols.ArrayLike], indices: collections.abc.Sequence[int], stream: False =
...) numpy.typing.NDArray[Any] Encode images at specified indices to embeddings.
- Parameters:¶
- Returns:¶
Embeddings array or iterator of batches.
- Return type:¶
NDArray[Any] or Iterator[tuple[Sequence[int], NDArray[Any]]]
- Raises:¶
IndexError – If any indices are out of range for the dataset.
FileNotFoundError – If the model file does not exist.
ImportError – If onnxruntime is not installed.
- property batch_size : int¶
Return the batch size used for encoding.
- property output_name : str | None¶
Return the output name for extraction, if set.