dataeval.scope.PrioritizeOutput

class dataeval.scope.PrioritizeOutput(rank_result, method, order='easy_first', policy='difficulty', num_bins=50, class_labels=None)

Ranking result with lazy index computation based on order and policy.

Stores the source ranking (always in easy_first order) and computes the final indices lazily based on the configured order and policy. All transformation methods return new PriorityOutput instances that operate on the same source data.

class_balanced(class_labels=None)

Return a new PriorityOutput with class-balanced sampling policy.

Reorders to ensure balanced representation across classes while maintaining priority order within each class.

Parameters:
class_labels : NDArray[np.integer]

Class label for each sample in the original dataset.

Returns:

New result with class_balanced policy.

Return type:

PriorityOutput

Examples

>>> # Get result
>>> result = Prioritize.knn(extractor, k=5).evaluate(unlabeled_data)
>>> # Rebucket based on classes (class_labels typically from metadata)
>>> balanced = result.class_balanced(class_labels)
>>> balanced.policy
'class_balanced'
data()

Return the ranked indices as the output data.

easy_first()

Return a new PriorityOutput with easy_first sort order.

Easy samples (prototypical, close to cluster centers) come first. Preserves the current policy. Idempotent if already easy_first.

Returns:

New result with easy_first order.

Return type:

PriorityOutput

Examples

>>> # Get result from Prioritize
>>> result = Prioritize.knn(extractor, k=5).evaluate(unlabeled_data)
>>> # Transform to easy_first
>>> easy_result = result.easy_first()
>>> easy_result.order
'easy_first'
hard_first()

Return a new PriorityOutput with hard_first sort order.

Hard samples (outliers, far from cluster centers) come first. Preserves the current policy. Idempotent if already hard_first.

Returns:

New result with hard_first order.

Return type:

PriorityOutput

Examples

>>> # Get result from Prioritize
>>> result = Prioritize.knn(extractor, k=5).evaluate(unlabeled_data)
>>> # Transform to hard_first
>>> hard_result = result.hard_first()
>>> hard_result.order
'hard_first'
stratified(num_bins=50)

Return a new PriorityOutput with stratified sampling policy.

Applies stratified sampling to balance selection across score bins. This encourages diversity by de-weighting samples with similar scores.

Parameters:
num_bins : int, default 50

Number of bins for stratification.

Returns:

New result with stratified policy.

Return type:

PriorityOutput

Raises:

ValueError – If scores are not available (computed lazily when indices accessed).

Examples

>>> # Get result from Prioritize
>>> result = Prioritize.knn(extractor, k=5).evaluate(unlabeled_data)
>>> # Apply stratification to the result
>>> strat_result = result.stratified(num_bins=10)
>>> strat_result.policy
'stratified'
property indices : numpy.typing.NDArray[numpy.intp]

Indices sorted according to configured order and policy (lazily computed).

Type:

NDArray[np.intp]

property method : MethodType

The ranking method that was used.

{“knn”, “kmeans_distance”, “kmeans_complexity”, “hdbscan_distance”, “hdbscan_complexity”}

property order : OrderType

Sort direction.

Type:

{“easy_first”, “hard_first”}

property policy : PolicyType

Selection policy.

Type:

{“difficulty”, “stratified”, “class_balanced”}

property scores : numpy.typing.NDArray[numpy.float32] | None

Ranking scores in configured order if available else None.

Type:

NDArray[np.float32] | None