dataeval.encoders.OnnxEncoder

class dataeval.encoders.OnnxEncoder(model, batch_size=None, transforms=None, output_name=None, flatten=True)

ONNX Runtime-based embedding encoder.

Encapsulates ONNX-specific logic for embedding extraction: - Model loading from ONNX files or in-memory bytes - Automatic GPU/CPU provider selection with fallback - Transform pipeline - Batch processing - Output layer selection for multi-output models

Parameters:
model : str, Path, or bytes

Path to the ONNX model file, or serialized model bytes from to_encoding_model().

batch_size : int or None, default None

Number of samples per batch. When None, uses DataEval’s configured batch size.

transforms : Transform or Sequence[Transform] or None, default None

Preprocessing transforms to apply before encoding. When None, uses raw images.

output_name : str or None, default None

Name of the output to extract embeddings from. When None, uses the first output. Required for models with multiple outputs.

flatten : bool, default True

If True, flattens outputs with more than 2 dimensions to (N, D) shape. If False, preserves the original output shape.

Example

Basic usage with a model file:

>>> from dataeval.encoders import OnnxEncoder
>>> from dataeval import Embeddings
>>>
>>> encoder = OnnxEncoder("model.onnx", batch_size=32)
>>> embeddings = Embeddings(dataset, encoder=encoder)

Notes

  • The encoder expects images in CHW format (channels, height, width).

  • For models with multiple outputs, use output_name to specify which output contains embeddings.

  • The model is loaded lazily on first use.

  • Requires onnxruntime or onnxruntime-gpu to be installed.

encode(dataset: dataeval.protocols.Dataset[tuple[dataeval.protocols.ArrayLike, Any, Any]] | dataeval.protocols.Dataset[dataeval.protocols.ArrayLike], indices: collections.abc.Sequence[int], stream: True) collections.abc.Iterator[tuple[collections.abc.Sequence[int], numpy.typing.NDArray[Any]]]
encode(dataset: dataeval.protocols.Dataset[tuple[dataeval.protocols.ArrayLike, Any, Any]] | dataeval.protocols.Dataset[dataeval.protocols.ArrayLike], indices: collections.abc.Sequence[int], stream: False = ...) numpy.typing.NDArray[Any]

Encode images at specified indices to embeddings.

Parameters:
dataset : Dataset

Dataset providing images to encode.

indices : Sequence[int]

Indices of images to encode from the dataset.

stream : bool, default False

If True, yields (batch_indices, batch_embeddings) tuples. If False, returns all embeddings as a single array.

Returns:

Embeddings array or iterator of batches.

Return type:

NDArray[Any] or Iterator[tuple[Sequence[int], NDArray[Any]]]

Raises:
  • IndexError – If any indices are out of range for the dataset.

  • FileNotFoundError – If the model file does not exist.

  • ImportError – If onnxruntime is not installed.

property batch_size : int

Return the batch size used for encoding.

property output_name : str | None

Return the output name for extraction, if set.