dataeval.data.selections.Prioritize

class dataeval.data.selections.Prioritize(model: torch.nn.Module, batch_size: int, device: dataeval.config.DeviceLike | None, method: 'knn', *, k: int | None = None)
class Prioritize(model, batch_size, device, method, *, c=None)

Prioritizes the dataset by sort order in the embedding space.

Parameters:
model : torch.nn.Module

Model to use for encoding images

batch_size : int

Batch size to use when encoding images

device : DeviceLike or None

Device to use for encoding images

method : Literal["knn", "kmeans_distance", "kmeans_complexity"]

Method to use for prioritization

k : int | None, default None

Number of nearest neighbors to use for prioritization (knn only)

c : int | None, default None

Number of clusters to use for prioritization (kmeans only)

classmethod using(method: 'knn', *, k: int | None = None, embeddings: dataeval.data.Embeddings | None = None, reference: dataeval.data.Embeddings | None = None) Prioritize
classmethod using(method: 'kmeans_distance' | 'kmeans_complexity', *, c: int | None = None, embeddings: dataeval.data.Embeddings | None = None, reference: dataeval.data.Embeddings | None = None) Prioritize

Prioritizes the dataset by sort order in the embedding space using existing embeddings and/or reference dataset embeddings.

Parameters:
method : Literal["knn", "kmeans_distance", "kmeans_complexity"]

Method to use for prioritization

embeddings : Embeddings or None, default None

Embeddings to use for prioritization

reference : Embeddings or None, default None

Reference embeddings to prioritize relative to

k : int or None, default None

Number of nearest neighbors to use for prioritization (knn only)

c : int or None, default None

Number of clusters to use for prioritization (kmeans, cluster only)

Notes

At least one of embeddings or reference must be provided.