dataeval.workflows.Sufficiency¶
-
class dataeval.workflows.Sufficiency(model, train_ds, test_ds, train_fn, eval_fn, runs=
1, substeps=5, train_kwargs=None, eval_kwargs=None, unit_interval=True)¶ Project dataset sufficiency using given a model and evaluation criteria.
- Parameters:¶
- model : nn.Module¶
Model that will be trained for each subset of data
- train_ds : torch.Dataset¶
Full training data that will be split for each run
- test_ds : torch.Dataset¶
Data that will be used for every run’s evaluation
- train_fn : Callable[[nn.Module, Dataset, Sequence[int]], None]¶
Function which takes a model, a dataset, and indices to train on and then executes model training against the data.
- eval_fn : Callable[[nn.Module, Dataset], Mapping[str, float | ArrayLike]]¶
Function which takes a model, a dataset and returns a dictionary of metric values which is used to assess model performance given the model and data.
- runs : int, default 1¶
Number of models to train over the entire dataset.
- substeps : int, default 5¶
The number of steps that each model will be trained and evaluated on.
- train_kwargs : Mapping | None, default None¶
Additional arguments required for custom training function
- eval_kwargs : Mapping | None, default None¶
Additional arguments required for custom evaluation function
- unit_interval : bool, default True¶
Constrains the power law to the interval [0, 1]. Set True (default) for metrics such as accuracy, precision, and recall which are defined to take values on [0,1]. Set False for metrics not on the unit interval.
Warning
Since each run is trained sequentially, increasing the parameter runs can significantly increase runtime.
Note
Substeps is overridden by the parameter eval_at in
Sufficiency.evaluate()-
evaluate(eval_at=
None)¶ Train and evaluate a model over multiple substeps
This function trains a model up to each step calculated from substeps. The model is then evaluated at that step and trained from 0 to the next step. This repeats for all substeps. Once a model has been trained and evaluated at all substeps, if runs is greater than one, the model weights are reset and the process is repeated.
During each evaluation, the metrics returned as a dictionary by the given evaluation function are stored and then averaged over when all runs are complete.
- Parameters:¶
- eval_at : int | Iterable[int] | None, default None¶
Specify this to collect metrics over a specific set of dataset lengths. If None, evaluates at each step is calculated by np.geomspace over the length of the dataset for self.substeps
- Returns:¶
Dataclass containing the average of each measure per substep
- Return type:¶
- Raises:¶
ValueError – If eval_at is not numerical
Examples
Default runs and substeps
>>> suff = Sufficiency( ... model=model, ... train_ds=train_ds, ... test_ds=test_ds, ... train_fn=train_fn, ... eval_fn=eval_fn, ... runs=3, ... substeps=5, ... ) >>> suff.evaluate() SufficiencyOutput(steps=array([ 1, 3, 10, 31, 100], dtype=uint32), measures={'test': array([[1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]])}, averaged_measures={'test': array([1., 1., 1., 1., 1.])}, n_iter=1000, unit_interval=True)Evaluate at a single value
>>> suff = Sufficiency( ... model=model, ... train_ds=train_ds, ... test_ds=test_ds, ... train_fn=train_fn, ... eval_fn=eval_fn, ... ) >>> suff.evaluate(eval_at=50) SufficiencyOutput(steps=array([50]), measures={'test': array([[1.]])}, averaged_measures={'test': array([1.])}, n_iter=1000, unit_interval=True)Evaluating at linear steps from 0-100 inclusive
>>> suff = Sufficiency( ... model=model, ... train_ds=train_ds, ... test_ds=test_ds, ... train_fn=train_fn, ... eval_fn=eval_fn, ... ) >>> suff.evaluate(eval_at=np.arange(0, 101, 20)) SufficiencyOutput(steps=array([ 0, 20, 40, 60, 80, 100]), measures={'test': array([[1., 1., 1., 1., 1., 1.]])}, averaged_measures={'test': array([1., 1., 1., 1., 1., 1.])}, n_iter=1000, unit_interval=True)