How to configuring logging with DataEval¶

Problem statement¶

DataEval uses Python’s standard logging module to provide visibility into operations and debugging information. This guide demonstrates how to configure logging to display messages in the console or save them to disk when using DataEval functions.

When to use¶

You want to see detailed information about DataEval operations
You need to debug issues or understand internal processing
You want to save logs to a file for later analysis
You need different logging levels for different parts of your code

What you will need¶

A Python environment with dataeval installed
Basic understanding of Python’s logging module

Getting started¶

# Google Colab Only
try:
    import google.colab  # noqa: F401

    # specify the version of DataEval (==X.XX.X) for versions other than the latest
    %pip install -q dataeval
except Exception:
    pass

import logging
import os

import sklearn.datasets as dsets

from dataeval.core._ber import ber_knn, ber_mst

Understanding logging levels¶

Python’s logging module supports several severity levels:

DEBUG: Detailed information, typically for diagnosing problems
INFO: Confirmation that things are working as expected
WARNING: An indication that something unexpected happened
ERROR: A serious problem that prevented a function from completing
CRITICAL: A very serious error

DataEval primarily uses DEBUG, INFO, and WARNING levels for normal operation.

Logging to console¶

This example demonstrates how to configure logging to display DataEval messages in the console.

Basic console logging (INFO level)¶

# Create console handler with formatting
console_handler = logging.StreamHandler()
console_handler.setFormatter(logging.Formatter("%(name)s - %(levelname)s - %(message)s"))

# Configure logging to show INFO level messages to console
dataeval_logger = logging.getLogger("dataeval")
dataeval_logger.setLevel(logging.INFO)
dataeval_logger.addHandler(console_handler)

# Create sample dataset
blobs = dsets.make_blobs(n_samples=100, centers=3, n_features=5, random_state=42)
embeddings, labels = blobs[0], blobs[1]

print("Running ber_knn with INFO logging:\n")
result = ber_knn(embeddings, labels, k=3)
print(f"\nResult: {result}")

dataeval.core._ber - INFO - Starting ber_knn calculation with k=3

dataeval.core._ber - INFO - BER_knn complete: upper_bound=0.0000, lower_bound=0.0000, misclassified=0

Running ber_knn with INFO logging:

Result: {'upper_bound': 0.0, 'lower_bound': 0.0}

Detailed console logging (DEBUG level)¶

For more detailed information, you can enable DEBUG level logging:

# Configure logging to show DEBUG level messages to console
dataeval_logger = logging.getLogger("dataeval")
dataeval_logger.setLevel(logging.DEBUG)
dataeval_logger.addHandler(console_handler)

print("Running ber_mst with DEBUG logging:\n")
result = ber_mst(embeddings, labels)
print(f"\nResult: {result}")

dataeval.core._ber - INFO - Starting ber_mst calculation

dataeval.core._ber - DEBUG - Number of classes: 3, Number of samples: 100

dataeval.core._mst - INFO - Starting minimum_spanning_tree calculation with k=15

dataeval.core._mst - DEBUG - Embeddings shape: (100, 5)

dataeval.core._mst - DEBUG - Computing neighbor distances with k=15

Running ber_mst with DEBUG logging:

dataeval.core._mst - DEBUG - Exhausted k-nearest neighbors (k=15) before finding connected spanning tree. Computing cluster nearest neighbors.

dataeval.core._mst - INFO - MST calculation complete: 99 edges computed

dataeval.core._ber - INFO - BER_mst complete: upper_bound=0.0000, lower_bound=0.0000, mismatches=0

Result: {'upper_bound': 0.0, 'lower_bound': 0.0}

Logging to disk¶

This example demonstrates how to save DataEval logs to a file for later analysis.

Basic file logging¶

Add the filename and filemode parameters to logging.basicConfig.

# Clear previous handlers
for handler in dataeval_logger.handlers[:]:
    dataeval_logger.removeHandler(handler)

# Configure logging to write to a file
log_file = "dataeval_operations.log"

# Create file handler with formatting
file_handler = logging.FileHandler(log_file, mode="w")  # 'w' to overwrite, 'a' to append
file_handler.setFormatter(logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s"))

dataeval_logger = logging.getLogger("dataeval")
dataeval_logger.setLevel(logging.INFO)
dataeval_logger.addHandler(file_handler)

print(f"Running operations with logging to {log_file}...\n")

# Run multiple operations
result1 = ber_mst(embeddings, labels)
result2 = ber_knn(embeddings, labels, k=5)

print(f"ber_mst result: {result1}")
print(f"ber_knn result: {result2}")
print(f"\nLogs have been saved to '{log_file}'")

# Display the log file contents
if os.path.exists(log_file):
    print("\n--- Log File Contents ---")
    with open(log_file) as f:
        print(f.read())

Running operations with logging to dataeval_operations.log...

ber_mst result: {'upper_bound': 0.0, 'lower_bound': 0.0}
ber_knn result: {'upper_bound': 0.0, 'lower_bound': 0.0}

Logs have been saved to 'dataeval_operations.log'

--- Log File Contents ---
2026-06-03 23:17:27,828 - dataeval.core._ber - INFO - Starting ber_mst calculation
2026-06-03 23:17:27,828 - dataeval.core._mst - INFO - Starting minimum_spanning_tree calculation with k=15
2026-06-03 23:17:27,832 - dataeval.core._mst - INFO - MST calculation complete: 99 edges computed
2026-06-03 23:17:27,832 - dataeval.core._ber - INFO - BER_mst complete: upper_bound=0.0000, lower_bound=0.0000, mismatches=0
2026-06-03 23:17:27,832 - dataeval.core._ber - INFO - Starting ber_knn calculation with k=5
2026-06-03 23:17:27,835 - dataeval.core._ber - INFO - BER_knn complete: upper_bound=0.0000, lower_bound=0.0000, misclassified=0

Combined console and file logging¶

You can log to both console and file simultaneously:

# Create logger
logger = logging.getLogger("dataeval")
logger.setLevel(logging.DEBUG)

# Create file handler (DEBUG level)
log_file = "dataeval_detailed.log"
file_handler = logging.FileHandler(log_file, mode="w")
file_handler.setLevel(logging.DEBUG)
file_formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
file_handler.setFormatter(file_formatter)

# Create console handler (INFO level only)
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
console_formatter = logging.Formatter("%(levelname)s - %(message)s")
console_handler.setFormatter(console_formatter)

# Add handlers to logger
logger.addHandler(file_handler)
logger.addHandler(console_handler)

print("Running with dual logging (INFO to console, DEBUG to file):\n")
result = ber_knn(embeddings, labels, k=7)
print(f"\nResult: {result}")
print("\nNote: Console shows only INFO messages, but file contains DEBUG details too.")

# Display the log file contents
if os.path.exists(log_file):
    print("\n--- Log File Contents ---")
    with open(log_file) as f:
        print(f.read())

INFO - Starting ber_knn calculation with k=7

INFO - BER_knn complete: upper_bound=0.0000, lower_bound=0.0000, misclassified=0

Running with dual logging (INFO to console, DEBUG to file):


Result: {'upper_bound': 0.0, 'lower_bound': 0.0}

Note: Console shows only INFO messages, but file contains DEBUG details too.

--- Log File Contents ---
2026-06-03 23:17:27,848 - dataeval.core._ber - INFO - Starting ber_knn calculation with k=7
2026-06-03 23:17:27,848 - dataeval.core._ber - DEBUG - Number of classes: 3, Number of samples: 100
2026-06-03 23:17:27,851 - dataeval.core._ber - INFO - BER_knn complete: upper_bound=0.0000, lower_bound=0.0000, misclassified=0

Temporarily disabling logs¶

# Disable all logging at CRITICAL level and below
logging.disable(logging.CRITICAL)

print("Running with logging disabled:\n")
result = ber_mst(embeddings, labels)
print(f"Result: {result}")
print("(No log messages should appear above)\n")

Running with logging disabled:

Result: {'upper_bound': 0.0, 'lower_bound': 0.0}
(No log messages should appear above)

# Re-enable logging
logging.disable(logging.NOTSET)

print("Running with logging re-enabled:\n")
result = ber_mst(embeddings, labels)
print(f"Result: {result}")

INFO - Starting ber_mst calculation

INFO - Starting minimum_spanning_tree calculation with k=15

INFO - MST calculation complete: 99 edges computed

INFO - BER_mst complete: upper_bound=0.0000, lower_bound=0.0000, mismatches=0

Running with logging re-enabled:

Result: {'upper_bound': 0.0, 'lower_bound': 0.0}

Best practices¶

Configure logging early: Set up logging configuration at the start of your script or notebook
Use file logging for production: Console logging is great for development, but file logging is better for production environments