Sufficiency¶

What is it¶

When to use it¶

The sufficiency class should be used when you would like to extrapolate hypothetical performance. For example, if you have a small dataset, and would like to know if it is worthwhile to collect more data.

Theory behind it¶

Tips and Tricks¶

Defining a Custom Training Function¶

Use a small step size and around 50 epochs per step on the curve.

def custom_train(model: nn.Module, dataset: Dataset, indices: Sequence[int]):
    # Defined only for this testing scenariov
    criterion = torch.nn.CrossEntropyLoss().to(device)
    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    epochs = 10

    # Define the dataloader for training
    dataloader = DataLoader(Subset(dataset, indices), batch_size=16)

    for epoch in range(epochs):
        for batch in dataloader:
            # Load data/images to device
            X = torch.Tensor(batch[0]).to(device)
            # Load targets/labels to device
            y = torch.Tensor(batch[1]).to(device)
            # Zero out gradients
            optimizer.zero_grad()
            # Forward propagation
            outputs = model(X)
            # Compute loss
            loss = criterion(outputs, y)
            # Back prop
            loss.backward()
            # Update weights/parameters
            optimizer.step()

Recommended parameters for Sufficiency¶

We recommend at least 5 bootstrap samples (runs) and 10 steps along the training curve per model (substeps).

# Create data indices for training
suff = Sufficiency(
    model=model,
    train_ds=train_ds,
    test_ds=test_ds,
    train_fn=train_fn,
    eval_fn=eval_fn,
    runs=5,
    substeps=10)

# Train & test model
output = suff.evaluate()