dataeval.data.selections.Prioritize¶

class Prioritize(model, batch_size, device, method, policy, *, c=None, class_label=None)¶

class Prioritize(model, batch_size, device, method, policy, *, k=None, c=None, class_label)

class Prioritize(model, batch_size, device, method, policy, *, k=None, c=None, class_label=None)

Sort the dataset indices in order of highest priority data in the embedding space.

Parameters:¶

model : torch.nn.Module | None¶: Model to use for encoding images
batch_size : int¶: Batch size to use when encoding images
device : DeviceLike or None¶: Device to use for encoding images
method : Literal["knn", "kmeans_distance", "kmeans_complexity"]¶: Method to use for prioritization
k : int or None, default None¶: Number of nearest neighbors to use for prioritization. If None, uses the square_root of the number of samples. Only used for method=”knn”, ignored otherwise.
c : int or None, default None¶: Number of clusters to use for prioritization. If None, uses the square_root of the number of samples. Only used for method=”kmeans_*”, ignored otherwise.

Notes

Use precalculated embeddings to sort the dataset indices in order of highest priority data in the embedding space.

Parameters:¶

method : Literal["knn", "kmeans_distance", "kmeans_complexity"]¶: Method to use for sample scoring during prioritization.
policy : Literal["hard_first","easy_first","stratified","class_balance"]¶: Selection policy for prioritizing scored samples.
embeddings : Embeddings or None, default None¶: Embeddings to use during prioritization. If None, reference must be set.
reference : Embeddings or None, default None¶: Reference embeddings used to prioritize the calculated dataset embeddings relative to them. If embeddings is None, this will be used instead.
k : int or None, default None¶: Number of nearest neighbors to use for prioritization. If None, uses the square_root of the number of samples. Only used for method=”knn”, ignored otherwise.
c : int or None, default None¶: Number of clusters to use for prioritization. If None, uses the square_root of the number of samples. Only used for method=”kmeans_*”, ignored otherwise.

Notes