dataeval.performance.Sufficiency

class dataeval.performance.Sufficiency(model, training_strategy=None, evaluation_strategy=None, reset_strategy=None, runs=None, substeps=None, unit_interval=None, config=None)

Analyze how much training data is needed for target model performance.

Trains models on progressively larger data subsets, evaluates at each step, and fits power law curves to predict performance on larger datasets.

This class is backend-agnostic and supports any ML framework (PyTorch, TensorFlow, JAX, etc.) through configurable strategies.

Parameters:
model : Any

Model to train (reset for each run). Can be any model type supported by your training and evaluation strategies.

training_strategy : TrainingStrategy or None, default None

Strategy for training models. If None, uses config.training_strategy.

evaluation_strategy : EvaluationStrategy or None, default None

Strategy for evaluating models. If None, uses config.evaluation_strategy.

reset_strategy : Callable[[Any], Any] or None, default None

Strategy for resetting model parameters between runs. Must be a callable that takes the model and returns a reset model (e.g., with re-initialized weights). If None, defaults to copy.deepcopy of the model captured at construction time.

runs : int or None, default None

Number of independent training runs. If None, uses config.runs (default 1).

substeps : int or None, default None

Number of evaluation steps per run. If None, uses config.substeps (default 5).

unit_interval : bool or None, default None

Whether metrics are constrained to [0, 1]. If None, uses config.unit_interval (default True).

config : Sufficiency.Config or None, default None

Optional configuration object. Parameters passed directly to __init__ override config values.

Warning

Since each run is trained sequentially, increasing the parameter runs can significantly increase runtime.

Notes

Multiple runs average results to reduce variance.

Parameters passed directly to __init__ override config defaults.

See also

Sufficiency.Config

Configuration object

SufficiencyOutput

Results with measures and projections

ModelResetStrategy

Protocol for reset strategies

evaluate(train_dataset, test_dataset, schedule=None)

Train and evaluate model across multiple dataset sizes.

This function trains a model up to each step calculated from substeps. The model is then evaluated at that step and trained from 0 to the next step. This repeats for all substeps. Once a model has been trained and evaluated at all substeps, if runs is greater than one, the model weights are reset and the process is repeated.

During each evaluation, the metrics returned as a dictionary by the given evaluation function are stored and then averaged over when all runs are complete.

Parameters:
train_dataset : Dataset

Full training data

test_dataset : Dataset

Test/validation data

schedule : EvaluationSchedule or int or Iterable[int] or None, default None

Specify this to collect metrics over a specific set of dataset lengths. If None, evaluates at each step calculated by np.geomspace over the length of the dataset

Returns:

Contains steps, measures, averaged_measures, and params

Return type:

SufficiencyOutput

Examples

>>> sufficiency = Sufficiency(
...     model=model,
...     training_strategy=CustomTrainingStrategy(),
...     evaluation_strategy=CustomEvaluationStrategy(),
...     reset_strategy=CustomResetStrategy(),
... )

Default runs and substeps:

>>> output = sufficiency.evaluate(train_dataset, test_dataset)

Evaluate at specific points:

>>> output = sufficiency.evaluate(train_dataset, test_dataset, schedule=[100, 500, 1000])

Evaluate at a custom geometric spacing

>>> from dataeval.performance.schedules import GeometricSchedule
>>> output = sufficiency.evaluate(train_dataset, test_dataset, schedule=GeometricSchedule(substeps=20))

Evaluate at custom linear steps from 0-100 inclusive

>>> class LinearSchedule:
...     def get_steps(self, dataset_length):
...         return np.arange(0, 101, 20)
>>> output = sufficiency.evaluate(train_dataset, test_dataset, schedule=LinearSchedule())

Classes

Config

Configuration for sufficiency analysis execution.