Sufficiency

How-To Guides

Check out this how to to begin using the Sufficiency class

DataEval API

class dataeval.workflows.Sufficiency(model: Module, train_ds: Dataset, test_ds: Dataset, train_fn: Callable[[Module, Dataset, Sequence[int]], None], eval_fn: Callable[[Module, Dataset], Dict[str, float] | Dict[str, ndarray[Any, dtype[_ScalarType_co]]]], runs: int = 1, substeps: int = 5, train_kwargs: Dict[str, Any] | None = None, eval_kwargs: Dict[str, Any] | None = None)

Project dataset sufficiency using given a model and evaluation criteria

Parameters:

model (nn.Module) – Model that will be trained for each subset of data
train_ds (Dataset) – Full training data that will be split for each run
test_ds (Dataset) – Data that will be used for every run’s evaluation
train_fn (Callable[[nn.Module, Dataset, Sequence[int]], None]) – Function which takes a model (torch.nn.Module), a dataset (torch.utils.data.Dataset), indices to train on and executes model training against the data.
eval_fn (Callable[[nn.Module, Dataset], Dict[str, float]]) – Function which takes a model (torch.nn.Module), a dataset (torch.utils.data.Dataset) and returns a dictionary of metric values (Dict[str, float]) which is used to assess model performance given the model and data.
runs (int, default 1) – Number of models to run over all subsets
substeps (int, default 5) – Total number of dataset partitions that each model will train on
train_kwargs (Dict[str, Any] | None, default None) – Additional arguments required for custom training function
eval_kwargs (Dict[str, Any] | None, default None) – Additional arguments required for custom evaluation function

evaluate(eval_at: ndarray[Any, dtype[_ScalarType_co]] | None = None, niter: int = 1000) → SufficiencyOutput

Creates data indices, trains models, and returns plotting data

Parameters:

eval_at (Optional[NDArray]) – Specify this to collect accuracies over a specific set of dataset lengths, rather than letting Sufficiency internally create the lengths to evaluate at.
niter (int, default 1000) – Iterations to perform when using the basin-hopping method to curve-fit measure(s).

Returns:

Dataclass containing the average of each measure per substep

Return type:

SufficiencyOutput

classmethod inv_project(targets: Dict[str, ndarray[Any, dtype[_ScalarType_co]]], data: SufficiencyOutput) → Dict[str, ndarray[Any, dtype[_ScalarType_co]]]

Calculate training samples needed to achieve target model metric values.

Parameters:

targets (Dict[str, NDArray]) – Dictionary of target metric scores (from 0.0 to 1.0) that we want to achieve, where the key is the name of the metric.
data (SufficiencyOutput) – Dataclass containing the average of each measure per substep

Returns:

List of the number of training samples needed to achieve each corresponding entry in targets

Return type:

Dict[str, NDArray]

classmethod plot(data: SufficiencyOutput, class_names: Sequence[str] | None = None) → List[Figure]

Plotting function for data sufficiency tasks

Parameters:: data (SufficiencyOutput) – Dataclass containing the average of each measure per substep
Returns:: List of Figures for each measure
Return type:: List[plt.Figure]
Raises:: ValueError – If the length of data points in the measures do not match

classmethod project(data: SufficiencyOutput, projection: int | Sequence[int] | ndarray[Any, dtype[uint64]]) → SufficiencyOutput

Projects the measures for each value of X

Parameters:

data (SufficiencyOutput) – Dataclass containing the average of each measure per substep
projection (Union[int, Sequence[int], NDArray[np.uint]]) – Step or steps to project

Returns:

Dataclass containing the projected measures per projection

Return type:

SufficiencyOutput

Raises:

ValueError – If the length of data points in the measures do not match If the steps are not int, Sequence[int] or an ndarray

Initializing Sufficiency

Defining a Custom Training Function

Use a small step size and around 50 epochs per step on the curve.

def custom_train(model: nn.Module, dataset: Dataset, indices: Sequence[int]):
    # Defined only for this testing scenario
    criterion = torch.nn.CrossEntropyLoss().to(device)
    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    epochs = 10

    # Define the dataloader for training
    dataloader = DataLoader(Subset(dataset, indices), batch_size=16)

    for epoch in range(epochs):
        for batch in dataloader:
            # Load data/images to device
            X = torch.Tensor(batch[0]).to(device)
            # Load targets/labels to device
            y = torch.Tensor(batch[1]).to(device)
            # Zero out gradients
            optimizer.zero_grad()
            # Forward propagation
            outputs = model(X)
            # Compute loss
            loss = criterion(outputs, y)
            # Back prop
            loss.backward()
            # Update weights/parameters
            optimizer.step()

Recommended parameters for Sufficiency

We recommend at least 5 bootstrap samples (runs) and 10 steps along the training curve per model (substeps).

# Create data indices for training
suff = Sufficiency(
    model=model,
    train_ds=train_ds,
    test_ds=test_ds,
    train_fn=train_fn,
    eval_fn=eval_fn,
    runs=5,
    substeps=10)

# Train & test model
output = suff.evaluate()