Sufficiency#

class dataeval.workflows.Sufficiency(model: Module, train_ds: Dataset, test_ds: Dataset, train_fn: Callable[[Module, Dataset, Sequence[int]], None], eval_fn: Callable[[Module, Dataset], dict[str, float] | dict[str, ndarray[Any, dtype[_ScalarType_co]]]], runs: int = 1, substeps: int = 5, train_kwargs: dict[str, Any] | None = None, eval_kwargs: dict[str, Any] | None = None)#

Project dataset sufficiency using given a model and evaluation criteria

Parameters:
  • model (nn.Module) – Model that will be trained for each subset of data

  • train_ds (torch.Dataset) – Full training data that will be split for each run

  • test_ds (torch.Dataset) – Data that will be used for every run’s evaluation

  • train_fn (Callable[[nn.Module, Dataset, Sequence[int]], None]) – Function which takes a model (torch.nn.Module), a dataset (torch.utils.data.Dataset), indices to train on and executes model training against the data.

  • eval_fn (Callable[[nn.Module, Dataset], Dict[str, float | NDArray]]) – Function which takes a model (torch.nn.Module), a dataset (torch.utils.data.Dataset) and returns a dictionary of metric values (Dict[str, float]) which is used to assess model performance given the model and data.

  • runs (int, default 1) – Number of models to run over all subsets

  • substeps (int, default 5) – Total number of dataset partitions that each model will train on

  • train_kwargs (Dict | None, default None) – Additional arguments required for custom training function

  • eval_kwargs (Dict | None, default None) – Additional arguments required for custom evaluation function

evaluate(eval_at: ndarray[Any, dtype[_ScalarType_co]] | None = None, niter: int = 1000) SufficiencyOutput#

Creates data indices, trains models, and returns plotting data

Parameters:
  • eval_at (NDArray | None, default None) – Specify this to collect accuracies over a specific set of dataset lengths, rather than letting Sufficiency internally create the lengths to evaluate at.

  • niter (int, default 1000) – Iterations to perform when using the basin-hopping method to curve-fit measure(s).

Returns:

Dataclass containing the average of each measure per substep

Return type:

SufficiencyOutput

Examples

>>> suff = Sufficiency(
...     model=model, train_ds=train_ds, test_ds=test_ds, train_fn=train_fn, eval_fn=eval_fn, runs=3, substeps=5
... )
>>> suff.evaluate()
SufficiencyOutput(steps=array([  1,   3,  10,  31, 100], dtype=uint32), params={'test': array([ 0., 42.,  0.])}, measures={'test': array([1., 1., 1., 1., 1.])})
class dataeval.workflows.SufficiencyOutput(steps: ndarray[Any, dtype[uint32]], params: dict[str, ndarray[Any, dtype[float64]]], measures: dict[str, ndarray[Any, dtype[float64]]])#
steps#

Array of sample sizes

Type:

NDArray

params#

Inverse power curve coefficients for the line of best fit for each measure

Type:

Dict[str, NDArray]

measures#

Average of values observed for each sample size step for each measure

Type:

Dict[str, NDArray]

inv_project(targets: dict[str, ndarray[Any, dtype[_ScalarType_co]]]) dict[str, ndarray[Any, dtype[_ScalarType_co]]]#

Calculate training samples needed to achieve target model metric values.

Parameters:

targets (Dict[str, NDArray]) – Dictionary of target metric scores (from 0.0 to 1.0) that we want to achieve, where the key is the name of the metric.

Returns:

List of the number of training samples needed to achieve each corresponding entry in targets

Return type:

Dict[str, NDArray]

plot(class_names: Sequence[str] | None = None) list[Figure]#

Plotting function for data sufficiency tasks

Parameters:

class_names (Sequence[str] | None, default None) – List of class names

Returns:

List of Figures for each measure

Return type:

List[plt.Figure]

Raises:

ValueError – If the length of data points in the measures do not match

project(projection: int | Sequence[int] | ndarray[Any, dtype[uint64]]) SufficiencyOutput#

Projects the measures for each value of X

Parameters:

projection (int | Sequence[int] | NDArray[np.uint]) – Step or steps to project

Returns:

Dataclass containing the projected measures per projection

Return type:

SufficiencyOutput

Raises:

ValueError – If the length of data points in the measures do not match If the steps are not int, Sequence[int] or an ndarray