dataeval.core.rerank_stratified¶
-
dataeval.core.rerank_stratified(result, num_bins=
50)¶ Rerank by stratified sampling across score bins.
Takes a RankResult (expected to be in easy_first order) and applies stratified sampling to balance selection across score bins. This encourages diversity by de-weighting samples with similar scores.
The output is in hard_first order to maintain priority while balancing.
- Parameters:¶
- Returns:¶
Dictionary containing:
indices: NDArray[np.intp] - Reranked indices in hard_first order
scores: NDArray[np.float32] | None - Scores in original order (unchanged)
method: str - Same as input
policy: str - “stratified”
- Return type:¶
RankResult
- Raises:¶
ValueError – If result does not contain scores (e.g., from rank_kmeans_complexity).
Examples
>>> from dataeval.core import rank_knn, rerank_stratified >>> import numpy as np >>> embeddings = np.random.rand(100, 64).astype(np.float32) >>> result = rank_knn(embeddings, k=5) >>> result = rerank_stratified(result, num_bins=20)