Configuring hardware: PyTorch devices and cpu processes

Problem Statement

DataEval provides global configuration settings to control computational resources and hardware acceleration. This guide shows how to configure the default PyTorch device and the maximum number of worker processes.

When to use

  • You need to specify GPU or CPU execution for PyTorch-based operations

  • You want to control the number of parallel worker processes

  • You need to optimize performance for your hardware configuration

What you will need

  1. A Python environment with dataeval installed

Getting Started

import dataeval

Configuring the PyTorch device

DataEval provides configuration options for setting the PyTorch device to use within DataEval. See torch.device for more information.

Set the default device to CPU

dataeval.config.set_device("cpu")

print(f"Current device for DataEval: {dataeval.config.get_device()}")
Current device for DataEval: cpu

Set the default device to CUDA GPU

dataeval.config.set_device("cuda")

print(f"Current device for DataEval: {dataeval.config.get_device()}")
Current device for DataEval: cuda

Set the default device to a specific CUDA GPU

dataeval.config.set_device("cuda:1")

print(f"Current device for DataEval: {dataeval.config.get_device()}")
Current device for DataEval: cuda:1

Reset the device to use PyTorch’s default device

dataeval.config.set_device(None)

print(f"Current device for DataEval: {dataeval.config.get_device()}")
Current device for DataEval: cpu

Configuring maximum worker processes

DataEval follows the maximum worker configuration conventions used by scikit-learn and joblib.

Set the maximum number of worker processes

dataeval.config.set_max_processes(4)
print(f"Max processes: {dataeval.config.get_max_processes()}")
Max processes: 4

Set the maximum number of workers to all visible cpu cores

dataeval.config.set_max_processes(-1)
print(f"Max processes: {dataeval.config.get_max_processes()}")
Max processes: -1

Unset the maximum number of workers

dataeval.config.set_max_processes(None)
print(f"Max processes: {dataeval.config.get_max_processes()}")
Max processes: None

Using temporary context managers

Temporarily override the max processes setting using a context manager:

dataeval.config.set_max_processes(8)
print(f"Before context: {dataeval.config.get_max_processes()}")

with dataeval.config.use_max_processes(2):
    print(f"Inside context: {dataeval.config.get_max_processes()}")
    # Perform operations with max_processes=2

print(f"After context: {dataeval.config.get_max_processes()}")
Before context: 8
Inside context: 2
After context: 8