How to configure global hardware configuration defaults in DataEval¶

Problem statement¶

DataEval provides global configuration settings to control computational resources and hardware acceleration. This guide shows how to configure the default PyTorch device, batch size, and the maximum number of worker processes.

When to use¶

You need to specify GPU or CPU execution for PyTorch-based operations
You want to set a global default batch size for data processing operations
You want to control the number of parallel worker processes
You need to optimize performance for your hardware configuration

What you will need¶

A Python environment with dataeval installed

Getting started¶

import dataeval

Configuring the PyTorch device¶

DataEval provides configuration options for setting the PyTorch device to use within DataEval. See torch.device for more information.

Set the default device to CPU¶

dataeval.config.set_device("cpu")

print(f"Current device for DataEval: {dataeval.config.get_device()}")

Current device for DataEval: cpu

Set the default device to CUDA GPU¶

dataeval.config.set_device("cuda")

print(f"Current device for DataEval: {dataeval.config.get_device()}")

Current device for DataEval: cuda

Set the default device to a specific CUDA GPU¶

dataeval.config.set_device("cuda:1")

print(f"Current device for DataEval: {dataeval.config.get_device()}")

Current device for DataEval: cuda:1

Reset the device to use PyTorch’s default device¶

dataeval.config.set_device(None)

print(f"Current device for DataEval: {dataeval.config.get_device()}")

Current device for DataEval: cpu

Configuring the default batch size¶

DataEval allows setting a global default batch size for operations that process data in batches. The batch size must be a positive integer.

Note that functions and methods that require a batch_size will fail if not provided and a global batch size is not set.

Set the default batch size¶

dataeval.config.set_batch_size(64)

print(f"Current batch size: {dataeval.config.get_batch_size()}")

Current batch size: 64

Reset the batch size to unset¶

dataeval.config.set_batch_size(None)

# When no batch size is set, get_batch_size() requires an explicit value
print("Batch size has been unset")

Batch size has been unset

Configuring maximum worker processes¶

DataEval follows the maximum worker configuration conventions used by scikit-learn and joblib.

Set the maximum number of worker processes¶

dataeval.config.set_max_processes(4)
print(f"Max processes: {dataeval.config.get_max_processes()}")

Max processes: 4

Set the maximum number of workers to all visible cpu cores¶

dataeval.config.set_max_processes(-1)
print(f"Max processes: {dataeval.config.get_max_processes()}")

Max processes: -1

Unset the maximum number of workers¶

dataeval.config.set_max_processes(None)
print(f"Max processes: {dataeval.config.get_max_processes()}")

Max processes: None

Using temporary context managers¶

Temporarily override configuration settings using context managers:

dataeval.config.set_batch_size(64)
print(f"Before context: {dataeval.config.get_batch_size()}")

with dataeval.config.use_batch_size(16):
    print(f"Inside context: {dataeval.config.get_batch_size()}")
    # Perform operations with batch_size=16

print(f"After context: {dataeval.config.get_batch_size()}")

Before context: 64
Inside context: 16
After context: 64

dataeval.config.set_max_processes(8)
print(f"Before context: {dataeval.config.get_max_processes()}")

with dataeval.config.use_max_processes(2):
    print(f"Inside context: {dataeval.config.get_max_processes()}")
    # Perform operations with max_processes=2

print(f"After context: {dataeval.config.get_max_processes()}")

Before context: 8
Inside context: 2
After context: 8