dataeval.quality.PrioritizeOutput¶
-
class dataeval.quality.PrioritizeOutput(rank_result, method, order=
'easy_first', policy='difficulty', num_bins=50, class_labels=None)¶ Ranking result with lazy index computation based on order and policy.
Stores the source ranking (always in easy_first order) and computes the final indices lazily based on the configured order and policy. All transformation methods return new PriorityOutput instances that operate on the same source data.
-
class_balanced(class_labels=
None)¶ Return a new PriorityOutput with class-balanced sampling policy.
Reorders to ensure balanced representation across classes while maintaining priority order within each class.
- Parameters:¶
- class_labels : NDArray[np.integer]¶
Class label for each sample in the original dataset.
- Returns:¶
New result with class_balanced policy.
- Return type:¶
PriorityOutput
Examples
>>> # Get result >>> result = Prioritize.knn(extractor, k=5).evaluate(unlabeled_data) >>> # Rebucket based on classes (class_labels typically from metadata) >>> balanced = result.class_balanced(class_labels) >>> balanced.policy 'class_balanced'
- easy_first()¶
Return a new PriorityOutput with easy_first sort order.
Easy samples (prototypical, close to cluster centers) come first. Preserves the current policy. Idempotent if already easy_first.
Examples
>>> # Get result from Prioritize >>> result = Prioritize.knn(extractor, k=5).evaluate(unlabeled_data) >>> # Transform to easy_first >>> easy_result = result.easy_first() >>> easy_result.order 'easy_first'
- hard_first()¶
Return a new PriorityOutput with hard_first sort order.
Hard samples (outliers, far from cluster centers) come first. Preserves the current policy. Idempotent if already hard_first.
Examples
>>> # Get result from Prioritize >>> result = Prioritize.knn(extractor, k=5).evaluate(unlabeled_data) >>> # Transform to hard_first >>> hard_result = result.hard_first() >>> hard_result.order 'hard_first'
-
stratified(num_bins=
50)¶ Return a new PriorityOutput with stratified sampling policy.
Applies stratified sampling to balance selection across score bins. This encourages diversity by de-weighting samples with similar scores.
- Parameters:¶
- num_bins : int, default 50¶
Number of bins for stratification.
- Returns:¶
New result with stratified policy.
- Return type:¶
PriorityOutput
- Raises:¶
ValueError – If scores are not available (computed lazily when indices accessed).
Examples
>>> # Get result from Prioritize >>> result = Prioritize.knn(extractor, k=5).evaluate(unlabeled_data) >>> # Apply stratification to the result >>> strat_result = result.stratified(num_bins=10) >>> strat_result.policy 'stratified'
- property indices : numpy.typing.NDArray[numpy.intp]¶
Indices sorted according to configured order and policy (lazily computed).
- Type:¶
NDArray[np.intp]
- property method : MethodType¶
The ranking method that was used.
{“knn”, “kmeans_distance”, “kmeans_complexity”, “hdbscan_distance”, “hdbscan_complexity”}
-
class_balanced(class_labels=