dataeval.core.rank_knn¶
-
dataeval.core.rank_knn(embeddings, k=
None, reference=None)¶ Rank samples using k-nearest neighbors distance.
Computes the mean distance to k nearest neighbors for each sample and ranks them in easy-first order (low distance = prototypical samples).
- Parameters:¶
- embeddings : NDArray[np.floating]¶
Embedding vectors to rank, shape (n_samples, n_features).
- k : int | None, default None¶
Number of nearest neighbors. If None, uses sqrt(n_samples).
- reference : NDArray[np.floating] | None, default None¶
Reference embeddings for comparative ranking. If provided, samples are ranked by distance to the reference set rather than to each other.
- Returns:¶
indices: NDArray[np.intp] - Indices sorted in easy-first order
scores: NDArray[np.float32] - KNN distance scores (in index order)
- Return type:¶
RankResult
- Raises:¶
ValueError – If k is invalid (>= dataset size or negative).
Examples
>>> from dataeval.core import rank_knn >>> import numpy as np >>> embeddings = np.random.rand(100, 64).astype(np.float32) >>> result = rank_knn(embeddings, k=5)With reference embeddings:
>>> reference = np.random.rand(50, 64).astype(np.float32) >>> result = rank_knn(embeddings, k=5, reference=reference)