dataeval.performance.Sufficiency

class dataeval.performance.Sufficiency(model, train_ds, test_ds, config)

Analyze how much training data is needed for target model performance.

Trains models on progressively larger data subsets, evaluates at each step, and fits power law curves to predict performance on larger datasets.

Parameters:
model : nn.Module

Model to train (reset for each run)

train_ds : torch.Dataset

Full training data

test_ds : torch.Dataset

Test/validation data

config : SufficiencyConfig

Training/evaluation strategies and run parameters.

Warning

Since each run is trained sequentially, increasing the parameter runs can significantly increase runtime.

Notes

Datasets are immutable after construction. To use different data, create a new instance.

Multiple runs average results to reduce variance.

See also

SufficiencyConfig

Configuration object

SufficiencyOutput

Results with measures and projections

evaluate(schedule=None)

Train and evaluate model across multiple dataset sizes.

This function trains a model up to each step calculated from substeps. The model is then evaluated at that step and trained from 0 to the next step. This repeats for all substeps. Once a model has been trained and evaluated at all substeps, if runs is greater than one, the model weights are reset and the process is repeated.

During each evaluation, the metrics returned as a dictionary by the given evaluation function are stored and then averaged over when all runs are complete.

Parameters:
schedule : EvaluationStrategy or int or Iterable[int] or None, default None

Specify this to collect metrics over a specific set of dataset lengths. If None, evaluates at each step calculated by np.geomspace over the length of the dataset

Returns:

Contains steps, measures, averaged_measures, and params

Return type:

SufficiencyOutput

Examples

>>> config = SufficiencyConfig(
...     CustomTrainingStrategy(),
...     CustomEvaluationStrategy(),
... )
>>> sufficiency = Sufficiency(
...     model=model,
...     train_ds=train_ds,
...     test_ds=test_ds,
...     config=config,
... )

Default runs and substeps:

>>> output = sufficiency.evaluate()

Evaluate at specific points:

>>> output = sufficiency.evaluate(schedule=[100, 500, 1000])

Evaluate at a custom geometric spacing

>>> from dataeval.performance.schedules import GeometricSchedule
>>> output = sufficiency.evaluate(schedule=GeometricSchedule(substeps=20))

Evaluate at custom linear steps from 0-100 inclusive

>>> class LinearSchedule:
...     def get_steps(self, dataset_length):
...         return np.arange(0, 101, 20)
>>> output = sufficiency.evaluate(schedule=LinearSchedule())
property runs : int

Number of independent runs

Return type:

int

property substeps : int

Number of a evaluation steps per run

Return type:

int

property test_ds : dataeval.protocols.Dataset[T]

Test dataset (read-only)

Notes

This property is read-only. To use a different test dataset, create a new Sufficiency instance

Return type:

dataeval.protocols.Dataset[T]

property train_ds : dataeval.protocols.Dataset[T]

Training dataset (read-only)

Notes

This property is read-only. To use a different training dataset, create a new Sufficiency instance

Return type:

dataeval.protocols.Dataset[T]

property unit_interval : bool

Whether metrics are constrained to [0, 1]

Return type:

bool