How to encode images with ONNX models

Problem Statement

When working with image datasets, generating embeddings is a common first step for many analysis tasks like clustering, duplicate detection, and coverage analysis. While PyTorch models are widely used, ONNX (Open Neural Network Exchange) provides a framework-agnostic format that offers portability and often better inference performance.

DataEval’s OnnxEncoder allows you to use any ONNX model to generate embeddings from your image datasets.

When to use

Use the OnnxEncoder when you want to:

  • Generate embeddings using a pre-trained ONNX model

  • Work with models exported from various frameworks (PyTorch, TensorFlow, etc.)

  • Leverage optimized inference without framework dependencies

What you will need

  1. An image dataset (we’ll use VOC2012)

  2. An ONNX model that outputs embeddings

  3. A Python environment with the following packages installed:

    • dataeval

    • onnxruntime (or onnxruntime-gpu for GPU support)

    • onnx (for model preparation utilities)

    • maite-datasets

Getting Started

Let’s import the required libraries needed to set up a minimal working example.

try:
    import google.colab  # noqa: F401

    %pip install -q dataeval[onnx] maite-datasets opencv-python-headless
except Exception:
    pass
import os

import cv2
import numpy as np
import requests
from maite_datasets.object_detection import VOCDetection

from dataeval import Embeddings
from dataeval.encoders import OnnxEncoder
from dataeval.selection import Limit, Select
from dataeval.utils.onnx import to_encoding_model

Preparing an ONNX model for embeddings

Most pre-trained ONNX models output classification logits rather than embeddings. To extract embeddings, we need to modify the model to output the features from an intermediate layer (typically before the final classification layer).

DataEval provides utility functions to help with this:

  • find_embedding_layer: Identifies the embedding layer in a classification model

  • to_encoding_model: Returns a modified model with the embedding layer name

We’ll download a ResNet50 model and use these utilities to prepare it for embedding extraction.

def download_onnx_model(url, save_path):
    """Downloads the ONNX model if it doesn't exist locally."""
    if os.path.exists(save_path):
        print(f"Model already exists at {save_path}")
        return

    print(f"Downloading model from {url}...")
    response = requests.get(url, stream=True)
    if response.status_code == 200:
        with open(save_path, "wb") as f:
            for chunk in response.iter_content(chunk_size=1024):
                f.write(chunk)
        print("Download complete.")
    else:
        raise Exception(f"Failed to download model. Status code: {response.status_code}")
# Download and prepare the model
model_url = "https://github.com/onnx/models/raw/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx"
model_path = "data/resnet50-v2-7.onnx"

download_onnx_model(model_url, model_path)

# Find the embedding layer and create an in-memory model that outputs it

encoding_model, embedding_layer = to_encoding_model(model_path)
print(f"Embedding layer: {embedding_layer}")
print(f"In-memory encoding model: ({len(encoding_model):,} bytes)")
Downloading model from https://github.com/onnx/models/raw/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx...
Download complete.
Embedding layer: resnetv24_flatten0_reshape0
In-memory encoding model: (102,452,438 bytes)

Loading the dataset

We’ll use the VOC2012 dataset for this demonstration.

# Define transforms for ResNet50 input requirements
def preprocess(image: np.ndarray) -> np.ndarray:
    """Preprocess image for ResNet50: CHW->HWC, resize, normalize, HWC->CHW."""
    hwc = image.transpose(1, 2, 0)  # Transpose to HWC
    resized = cv2.resize(hwc, (224, 224))  # Resize using standard bi-linear interpolation
    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
    std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
    normalized = (resized.astype(np.float32) / 255.0 - mean) / std  # Normalize
    chw = normalized.transpose(2, 0, 1)  # Transpose back to CHW
    return chw
# Load VOC dataset
dataset = VOCDetection(root="./data", image_set="val", year="2012", download=True)
print(f"Dataset size: {len(dataset)} images")
Dataset size: 5823 images

Using OnnxEncoder to generate embeddings

Now we can use the OnnxEncoder to generate embeddings from our dataset.

# Create the encoder with our in-memory embedding model
encoder = OnnxEncoder(
    model=encoding_model,
    batch_size=16,
    transforms=preprocess,
    output_name=embedding_layer,  # Specify which output to use
)

print(encoder)
OnnxEncoder(model=<102452438 bytes>, batch_size=16, output_name='resnetv24_flatten0_reshape0')
# Generate embeddings using the Embeddings class
# We'll use a subset for demonstration
subset = Select(dataset, Limit(100))
embeddings = Embeddings(subset, encoder=encoder)

print(f"Embeddings shape: {embeddings.shape}")
*************** EP Error ***************
EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:129 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; SUCCTYPE = cudaError; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; SUCCTYPE = cudaError; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 100: no CUDA-capable device is detected ; GPU=-1 ; hostname=runner-4d-xbojrk-project-151-concurrent-0 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=282 ; expr=cudaSetDevice(info_.device_id); 

 when using ['CUDAExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
****************************************
2026-01-30 08:24:27.030184960 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card0/device/vendor"
Embeddings shape: (100, 2048)

The embeddings have shape (N, D) where:

  • N is the number of images (100 in our subset)

  • D is the embedding dimension (2048 for ResNet50)

### TEST ASSERTION CELL ###
assert embeddings.shape[0] == 100
assert embeddings.shape[1] == 2048