dataeval.core.rank_knn

dataeval.core.rank_knn(embeddings, k=None, reference=None)

Rank samples using k-nearest neighbors distance.

Computes the mean distance to k nearest neighbors for each sample and ranks them in easy-first order (low distance = prototypical samples).

Parameters:
embeddings : NDArray[np.floating]

Embedding vectors to rank, shape (n_samples, n_features).

k : int | None, default None

Number of nearest neighbors. If None, uses sqrt(n_samples).

reference : NDArray[np.floating] | None, default None

Reference embeddings for comparative ranking. If provided, samples are ranked by distance to the reference set rather than to each other.

Returns:

  • indices: NDArray[np.intp] - Indices sorted in easy-first order

  • scores: NDArray[np.float32] - KNN distance scores (in index order)

Return type:

RankResult

Raises:

ValueError – If k is invalid (>= dataset size or negative).

Examples

>>> from dataeval.core import rank_knn
>>> import numpy as np
>>> embeddings = np.random.rand(100, 64).astype(np.float32)
>>> result = rank_knn(embeddings, k=5)

With reference embeddings:

>>> reference = np.random.rand(50, 64).astype(np.float32)
>>> result = rank_knn(embeddings, k=5, reference=reference)