dataeval.performance.Sufficiency¶
-
class dataeval.performance.Sufficiency(model, training_strategy=
None, evaluation_strategy=None, reset_strategy=None, runs=None, substeps=None, unit_interval=None, config=None)¶ Analyze how much training data is needed for target model performance.
Trains models on progressively larger data subsets, evaluates at each step, and fits power law curves to predict performance on larger datasets.
This class is backend-agnostic and supports any ML framework (PyTorch, TensorFlow, JAX, etc.) through configurable strategies.
- Parameters:¶
- model : Any¶
Model to train (reset for each run). Can be any model type supported by your training and evaluation strategies.
- training_strategy : TrainingStrategy or None, default None¶
Strategy for training models. If None, uses config.training_strategy.
- evaluation_strategy : EvaluationStrategy or None, default None¶
Strategy for evaluating models. If None, uses config.evaluation_strategy.
- reset_strategy : Callable[[Any], Any] or None, default None¶
Strategy for resetting model parameters between runs. Must be a callable that takes the model and returns a reset model (e.g., with re-initialized weights). If None, defaults to
copy.deepcopyof the model captured at construction time.- runs : int or None, default None¶
Number of independent training runs. If None, uses config.runs (default 1).
- substeps : int or None, default None¶
Number of evaluation steps per run. If None, uses config.substeps (default 5).
- unit_interval : bool or None, default None¶
Whether metrics are constrained to [0, 1]. If None, uses config.unit_interval (default True).
- config : Sufficiency.Config or None, default None¶
Optional configuration object. Parameters passed directly to __init__ override config values.
Warning
Since each run is trained sequentially, increasing the parameter runs can significantly increase runtime.
See also
Sufficiency.ConfigConfiguration object
SufficiencyOutputResults with measures and projections
ModelResetStrategyProtocol for reset strategies
Notes
Multiple runs average results to reduce variance.
Parameters passed directly to __init__ override config defaults.
-
evaluate(train_dataset, test_dataset, schedule=
None)¶ Train and evaluate model across multiple dataset sizes.
This function trains a model up to each step calculated from substeps. The model is then evaluated at that step and trained from 0 to the next step. This repeats for all substeps. Once a model has been trained and evaluated at all substeps, if runs is greater than one, the model weights are reset and the process is repeated.
During each evaluation, the metrics returned as a dictionary by the given evaluation function are stored and then averaged over when all runs are complete.
- Parameters:¶
- train_dataset : Dataset¶
Full training data
- test_dataset : Dataset¶
Test/validation data
- schedule : EvaluationSchedule or int or Iterable[int] or None, default None¶
Specify this to collect metrics over a specific set of dataset lengths. If None, evaluates at each step calculated by np.geomspace over the length of the dataset
- Returns:¶
Contains steps, measures, averaged_measures, and params
- Return type:¶
Examples
>>> sufficiency = Sufficiency( ... model=model, ... training_strategy=CustomTrainingStrategy(), ... evaluation_strategy=CustomEvaluationStrategy(), ... reset_strategy=CustomResetStrategy(), ... )Default runs and substeps:
>>> output = sufficiency.evaluate(train_dataset, test_dataset)Evaluate at specific points:
>>> output = sufficiency.evaluate(train_dataset, test_dataset, schedule=[100, 500, 1000])Evaluate at a custom geometric spacing
>>> from dataeval.performance.schedules import GeometricSchedule >>> output = sufficiency.evaluate(train_dataset, test_dataset, schedule=GeometricSchedule(substeps=20))Evaluate at custom linear steps from 0-100 inclusive
>>> class LinearSchedule: ... def get_steps(self, dataset_length): ... return np.arange(0, 101, 20) >>> output = sufficiency.evaluate(train_dataset, test_dataset, schedule=LinearSchedule())
Classes¶
Configuration for sufficiency analysis execution. |