dataeval.core.rank_knn¶
-
dataeval.core.rank_knn(embeddings, k=
None, reference=None)¶ Rank samples using k-nearest neighbors distance.
Returns samples in easy-first order (low distance = prototypical samples). Use rerank_hard_first() to reverse the order, or other rerank_* functions to apply different selection policies.
- Parameters:¶
- embeddings : NDArray[np.floating]¶
Embedding vectors to rank, shape (n_samples, n_features).
- k : int | None, default None¶
Number of nearest neighbors. If None, uses sqrt(n_samples).
- reference : NDArray[np.floating] | None, default None¶
Reference embeddings for comparative ranking. If provided, samples are ranked relative to the reference set rather than themselves.
- Returns:¶
Dictionary containing:
indices: NDArray[np.intp] - Ranked indices in easy-first order
scores: NDArray[np.float32] | None - KNN distance scores for each sample
method: str - “knn”
policy: str - “easy_first”
- Return type:¶
RankResult
- Raises:¶
ValueError – If k is invalid (>= dataset size or negative).
Examples
Basic ranking:
>>> from dataeval.core import rank_knn >>> import numpy as np >>> embeddings = np.random.rand(100, 64).astype(np.float32) >>> result = rank_knn(embeddings, k=5)Hard-first order:
>>> from dataeval.core import rank_knn, rerank_hard_first >>> result = rank_knn(embeddings, k=5) >>> result = rerank_hard_first(result)Rank relative to reference:
>>> reference = np.random.rand(50, 64).astype(np.float32) >>> result = rank_knn(embeddings, k=5, reference=reference)