dataeval.workflows.Sufficiency¶

class dataeval.workflows.Sufficiency(model, train_ds, test_ds, train_fn, eval_fn, runs=1, substeps=5, train_kwargs=None, eval_kwargs=None, unit_interval=True)¶

Project dataset sufficiency using given a model and evaluation criteria.

Parameters:¶

model : nn.Module¶: Model that will be trained for each subset of data
train_ds : torch.Dataset¶: Full training data that will be split for each run
test_ds : torch.Dataset¶: Data that will be used for every run’s evaluation
train_fn : Callable[[nn.Module, Dataset, Sequence[int]], None]¶: Function which takes a model, a dataset, and indices to train on and then executes model training against the data.
eval_fn : Callable[[nn.Module, Dataset], Mapping[str, float | ArrayLike]]¶: Function which takes a model, a dataset and returns a dictionary of metric values which is used to assess model performance given the model and data.
runs : int, default 1¶: Number of models to train over the entire dataset.
substeps : int, default 5¶: The number of steps that each model will be trained and evaluated on.
train_kwargs : Mapping | None, default None¶: Additional arguments required for custom training function
eval_kwargs : Mapping | None, default None¶: Additional arguments required for custom evaluation function
unit_interval : bool, default True¶: Constrains the power law to the interval [0, 1]. Set True (default) for metrics such as accuracy, precision, and recall which are defined to take values on [0,1]. Set False for metrics not on the unit interval.

Warning

Since each run is trained sequentially, increasing the parameter runs can significantly increase runtime.

Note

Substeps is overridden by the parameter eval_at in Sufficiency.evaluate()

evaluate(eval_at=None)¶

Train and evaluate a model over multiple substeps

This function trains a model up to each step calculated from substeps. The model is then evaluated at that step and trained from 0 to the next step. This repeats for all substeps. Once a model has been trained and evaluated at all substeps, if runs is greater than one, the model weights are reset and the process is repeated.

During each evaluation, the metrics returned as a dictionary by the given evaluation function are stored and then averaged over when all runs are complete.

Parameters:¶

eval_at : int | Iterable[int] | None, default None¶: Specify this to collect metrics over a specific set of dataset lengths. If None, evaluates at each step is calculated by np.geomspace over the length of the dataset for self.substeps

Returns:¶

Dataclass containing the average of each measure per substep

Return type:¶

SufficiencyOutput

Raises:¶

ValueError – If eval_at is not numerical

Examples

Default runs and substeps

>>> suff = Sufficiency(
...     model=model,
...     train_ds=train_ds,
...     test_ds=test_ds,
...     train_fn=train_fn,
...     eval_fn=eval_fn,
...     runs=3,
...     substeps=5,
... )
>>> suff.evaluate()
SufficiencyOutput(steps=array([  1,   3,  10,  31, 100], dtype=uint32), measures={'test': array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])}, averaged_measures={'test': array([1., 1., 1., 1., 1.])}, n_iter=1000, unit_interval=True)

Evaluate at a single value

>>> suff = Sufficiency(
...     model=model,
...     train_ds=train_ds,
...     test_ds=test_ds,
...     train_fn=train_fn,
...     eval_fn=eval_fn,
... )
>>> suff.evaluate(eval_at=50)
SufficiencyOutput(steps=array([50]), measures={'test': array([[1.]])}, averaged_measures={'test': array([1.])}, n_iter=1000, unit_interval=True)

Evaluating at linear steps from 0-100 inclusive

>>> suff = Sufficiency(
...     model=model,
...     train_ds=train_ds,
...     test_ds=test_ds,
...     train_fn=train_fn,
...     eval_fn=eval_fn,
... )
>>> suff.evaluate(eval_at=np.arange(0, 101, 20))
SufficiencyOutput(steps=array([  0,  20,  40,  60,  80, 100]), measures={'test': array([[1., 1., 1., 1., 1., 1.]])}, averaged_measures={'test': array([1., 1., 1., 1., 1., 1.])}, n_iter=1000, unit_interval=True)