How to configuring logging with DataEval

Problem statement

DataEval uses Python’s standard logging module to provide visibility into operations and debugging information. This guide demonstrates how to configure logging to display messages in the console or save them to disk when using DataEval functions.

When to use

  • You want to see detailed information about DataEval operations

  • You need to debug issues or understand internal processing

  • You want to save logs to a file for later analysis

  • You need different logging levels for different parts of your code

What you will need

  1. A Python environment with dataeval installed

  2. Basic understanding of Python’s logging module

Getting started

# Google Colab Only
try:
    import google.colab  # noqa: F401

    # specify the version of DataEval (==X.XX.X) for versions other than the latest
    %pip install -q dataeval
except Exception:
    pass
import logging
import os

import sklearn.datasets as dsets

from dataeval.core._ber import ber_knn, ber_mst

Understanding logging levels

Python’s logging module supports several severity levels:

  • DEBUG: Detailed information, typically for diagnosing problems

  • INFO: Confirmation that things are working as expected

  • WARNING: An indication that something unexpected happened

  • ERROR: A serious problem that prevented a function from completing

  • CRITICAL: A very serious error

DataEval primarily uses DEBUG, INFO, and WARNING levels for normal operation.

Logging to console

This example demonstrates how to configure logging to display DataEval messages in the console.

Basic console logging (INFO level)

# Create console handler with formatting
console_handler = logging.StreamHandler()
console_handler.setFormatter(logging.Formatter("%(name)s - %(levelname)s - %(message)s"))
# Configure logging to show INFO level messages to console
dataeval_logger = logging.getLogger("dataeval")
dataeval_logger.setLevel(logging.INFO)
dataeval_logger.addHandler(console_handler)
# Create sample dataset
embeddings, labels = dsets.make_blobs(n_samples=100, centers=3, n_features=5, random_state=42)

print("Running ber_knn with INFO logging:\n")
result = ber_knn(embeddings, labels, k=3)
print(f"\nResult: {result}")
dataeval.core._ber - INFO - Starting ber_knn calculation with k=3
dataeval.core._ber - INFO - BER_knn complete: upper_bound=0.0000, lower_bound=0.0000, misclassified=0
Running ber_knn with INFO logging:


Result: {'upper_bound': 0.0, 'lower_bound': 0.0}

Detailed console logging (DEBUG level)

For more detailed information, you can enable DEBUG level logging:

# Configure logging to show DEBUG level messages to console
dataeval_logger = logging.getLogger("dataeval")
dataeval_logger.setLevel(logging.DEBUG)
dataeval_logger.addHandler(console_handler)
print("Running ber_mst with DEBUG logging:\n")
result = ber_mst(embeddings, labels)
print(f"\nResult: {result}")
dataeval.core._ber - INFO - Starting ber_mst calculation
dataeval.core._ber - DEBUG - Number of classes: 3, Number of samples: 100
dataeval.core._mst - INFO - Starting minimum_spanning_tree calculation with k=15
dataeval.core._mst - DEBUG - Embeddings shape: (100, 5)
dataeval.core._mst - DEBUG - Computing neighbor distances with k=15
Running ber_mst with DEBUG logging:
dataeval.core._mst - DEBUG - Exhausted k-nearest neighbors (k=15) before finding connected spanning tree. Computing cluster nearest neighbors.
dataeval.core._mst - INFO - MST calculation complete: 99 edges computed
dataeval.core._ber - INFO - BER_mst complete: upper_bound=0.0000, lower_bound=0.0000, mismatches=0
Result: {'upper_bound': 0.0, 'lower_bound': 0.0}

Logging to disk

This example demonstrates how to save DataEval logs to a file for later analysis.

Basic file logging

Add the filename and filemode parameters to logging.basicConfig.

# Clear previous handlers
for handler in dataeval_logger.handlers[:]:
    dataeval_logger.removeHandler(handler)
# Configure logging to write to a file
log_file = "dataeval_operations.log"

# Create file handler with formatting
file_handler = logging.FileHandler(log_file, mode="w")  # 'w' to overwrite, 'a' to append
file_handler.setFormatter(logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s"))

dataeval_logger = logging.getLogger("dataeval")
dataeval_logger.setLevel(logging.INFO)
dataeval_logger.addHandler(file_handler)
print(f"Running operations with logging to {log_file}...\n")

# Run multiple operations
result1 = ber_mst(embeddings, labels)
result2 = ber_knn(embeddings, labels, k=5)

print(f"ber_mst result: {result1}")
print(f"ber_knn result: {result2}")
print(f"\nLogs have been saved to '{log_file}'")

# Display the log file contents
if os.path.exists(log_file):
    print("\n--- Log File Contents ---")
    with open(log_file) as f:
        print(f.read())
Running operations with logging to dataeval_operations.log...

ber_mst result: {'upper_bound': 0.0, 'lower_bound': 0.0}
ber_knn result: {'upper_bound': 0.0, 'lower_bound': 0.0}

Logs have been saved to 'dataeval_operations.log'

--- Log File Contents ---
2026-02-09 19:20:38,360 - dataeval.core._ber - INFO - Starting ber_mst calculation
2026-02-09 19:20:38,360 - dataeval.core._mst - INFO - Starting minimum_spanning_tree calculation with k=15
2026-02-09 19:20:38,363 - dataeval.core._mst - INFO - MST calculation complete: 99 edges computed
2026-02-09 19:20:38,363 - dataeval.core._ber - INFO - BER_mst complete: upper_bound=0.0000, lower_bound=0.0000, mismatches=0
2026-02-09 19:20:38,363 - dataeval.core._ber - INFO - Starting ber_knn calculation with k=5
2026-02-09 19:20:38,365 - dataeval.core._ber - INFO - BER_knn complete: upper_bound=0.0000, lower_bound=0.0000, misclassified=0

Combined console and file logging

You can log to both console and file simultaneously:

# Create logger
logger = logging.getLogger("dataeval")
logger.setLevel(logging.DEBUG)

# Create file handler (DEBUG level)
log_file = "dataeval_detailed.log"
file_handler = logging.FileHandler(log_file, mode="w")
file_handler.setLevel(logging.DEBUG)
file_formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
file_handler.setFormatter(file_formatter)

# Create console handler (INFO level only)
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
console_formatter = logging.Formatter("%(levelname)s - %(message)s")
console_handler.setFormatter(console_formatter)

# Add handlers to logger
logger.addHandler(file_handler)
logger.addHandler(console_handler)
print("Running with dual logging (INFO to console, DEBUG to file):\n")
result = ber_knn(embeddings, labels, k=7)
print(f"\nResult: {result}")
print("\nNote: Console shows only INFO messages, but file contains DEBUG details too.")

# Display the log file contents
if os.path.exists(log_file):
    print("\n--- Log File Contents ---")
    with open(log_file) as f:
        print(f.read())
INFO - Starting ber_knn calculation with k=7
INFO - BER_knn complete: upper_bound=0.0000, lower_bound=0.0000, misclassified=0
Running with dual logging (INFO to console, DEBUG to file):


Result: {'upper_bound': 0.0, 'lower_bound': 0.0}

Note: Console shows only INFO messages, but file contains DEBUG details too.

--- Log File Contents ---
2026-02-09 19:20:38,380 - dataeval.core._ber - INFO - Starting ber_knn calculation with k=7
2026-02-09 19:20:38,380 - dataeval.core._ber - DEBUG - Number of classes: 3, Number of samples: 100
2026-02-09 19:20:38,383 - dataeval.core._ber - INFO - BER_knn complete: upper_bound=0.0000, lower_bound=0.0000, misclassified=0

Temporarily disabling logs

# Disable all logging at CRITICAL level and below
logging.disable(logging.CRITICAL)
print("Running with logging disabled:\n")
result = ber_mst(embeddings, labels)
print(f"Result: {result}")
print("(No log messages should appear above)\n")
Running with logging disabled:

Result: {'upper_bound': 0.0, 'lower_bound': 0.0}
(No log messages should appear above)
# Re-enable logging
logging.disable(logging.NOTSET)
print("Running with logging re-enabled:\n")
result = ber_mst(embeddings, labels)
print(f"Result: {result}")
INFO - Starting ber_mst calculation
INFO - Starting minimum_spanning_tree calculation with k=15
INFO - MST calculation complete: 99 edges computed
INFO - BER_mst complete: upper_bound=0.0000, lower_bound=0.0000, mismatches=0
Running with logging re-enabled:

Result: {'upper_bound': 0.0, 'lower_bound': 0.0}

Best practices

  1. Configure logging early: Set up logging configuration at the start of your script or notebook

  2. Use file logging for production: Console logging is great for development, but file logging is better for production environments