dataeval.core.rerank_class_balance

dataeval.core.rerank_class_balance(result, class_labels)

Rerank to balance selection across class labels.

Takes a RankResult (expected to be in easy_first order) and reranks to ensure balanced representation across classes while maintaining the priority order within each class.

The output is in hard_first order to maintain priority while balancing.

Parameters:
result : RankResult

Ranking result in any order.

class_labels : NDArray[np.integer]

Class label for each sample in the original dataset.

Returns:

Dictionary containing:

  • indices: NDArray[np.intp] - Reranked indices in hard_first order with class balance

  • scores: NDArray[np.float32] | None - Scores in original order (unchanged if present)

  • method: str - Same as input

  • policy: str - “class_balance”

Return type:

RankResult

Examples

>>> from dataeval.core import rank_knn, rerank_class_balance
>>> import numpy as np
>>> embeddings = np.random.rand(100, 64).astype(np.float32)
>>> labels = np.random.randint(0, 3, size=100)
>>> result = rank_knn(embeddings, k=5)
>>> result = rerank_class_balance(result, class_labels=labels)