dataeval.performance.Sufficiency

class dataeval.performance.Sufficiency(model, train_ds, test_ds, training_strategy=None, evaluation_strategy=None, reset_strategy=None, runs=None, substeps=None, unit_interval=None, config=None)

Analyze how much training data is needed for target model performance.

Trains models on progressively larger data subsets, evaluates at each step, and fits power law curves to predict performance on larger datasets.

This class is backend-agnostic and supports any ML framework (PyTorch, TensorFlow, JAX, etc.) through configurable strategies.

Parameters:
model : Any

Model to train (reset for each run). Can be any model type supported by your training and evaluation strategies.

train_ds : Dataset

Full training data

test_ds : Dataset

Test/validation data

training_strategy : TrainingStrategy or None, default None

Strategy for training models. If None, uses config.training_strategy.

evaluation_strategy : EvaluationStrategy or None, default None

Strategy for evaluating models. If None, uses config.evaluation_strategy.

reset_strategy : Callable[[Any], Any]

Strategy for resetting model parameters between runs. Required. Must be a callable that takes the model and returns a reset model (e.g., with re-initialized weights).

runs : int or None, default None

Number of independent training runs. If None, uses config.runs (default 1).

substeps : int or None, default None

Number of evaluation steps per run. If None, uses config.substeps (default 5).

unit_interval : bool or None, default None

Whether metrics are constrained to [0, 1]. If None, uses config.unit_interval (default True).

config : Sufficiency.Config or None, default None

Optional configuration object. Parameters passed directly to __init__ override config values.

Warning

Since each run is trained sequentially, increasing the parameter runs can significantly increase runtime.

Notes

Datasets are immutable after construction. To use different data, create a new instance.

Multiple runs average results to reduce variance.

Parameters passed directly to __init__ override config defaults.

You must provide a reset_strategy that knows how to reset your model to its initial state between runs.

See also

Sufficiency.Config

Configuration object

SufficiencyOutput

Results with measures and projections

ModelResetStrategy

Protocol for reset strategies

evaluate(schedule=None)

Train and evaluate model across multiple dataset sizes.

This function trains a model up to each step calculated from substeps. The model is then evaluated at that step and trained from 0 to the next step. This repeats for all substeps. Once a model has been trained and evaluated at all substeps, if runs is greater than one, the model weights are reset and the process is repeated.

During each evaluation, the metrics returned as a dictionary by the given evaluation function are stored and then averaged over when all runs are complete.

Parameters:
schedule : EvaluationStrategy or int or Iterable[int] or None, default None

Specify this to collect metrics over a specific set of dataset lengths. If None, evaluates at each step calculated by np.geomspace over the length of the dataset

Returns:

Contains steps, measures, averaged_measures, and params

Return type:

SufficiencyOutput

Examples

>>> sufficiency = Sufficiency(
...     model=model,
...     train_ds=train_ds,
...     test_ds=test_ds,
...     training_strategy=CustomTrainingStrategy(),
...     evaluation_strategy=CustomEvaluationStrategy(),
...     reset_strategy=CustomResetStrategy(),
... )

Default runs and substeps:

>>> output = sufficiency.evaluate()

Evaluate at specific points:

>>> output = sufficiency.evaluate(schedule=[100, 500, 1000])

Evaluate at a custom geometric spacing

>>> from dataeval.performance.schedules import GeometricSchedule
>>> output = sufficiency.evaluate(schedule=GeometricSchedule(substeps=20))

Evaluate at custom linear steps from 0-100 inclusive

>>> class LinearSchedule:
...     def get_steps(self, dataset_length):
...         return np.arange(0, 101, 20)
>>> output = sufficiency.evaluate(schedule=LinearSchedule())
property test_ds : dataeval.protocols.Dataset[T]

Test dataset (read-only).

Notes

This property is read-only. To use a different test dataset, create a new Sufficiency instance

property train_ds : dataeval.protocols.Dataset[T]

Training dataset (read-only).

Notes

This property is read-only. To use a different training dataset, create a new Sufficiency instance

Classes

Config

Configuration for sufficiency analysis execution.