dataeval.core.minimum_spanning_tree

dataeval.core.minimum_spanning_tree(data, k=15)

Compute the minimum spanning tree of a dataset.

This is a high-level interface that computes k-nearest neighbors and then constructs the minimum spanning tree from the resulting graph.

Parameters:
data : NDArray

Input data with shape (n_samples, n_features) or can be flattened

k : int, default=15

Number of nearest neighbors to use for building the k-NN graph. Higher values increase connectivity but add computational cost. Should be large enough to ensure graph connectivity.

Returns:

  • rows (NDArray[np.intp]) – Source node indices for each edge in the MST with shape (n_samples - 1,)

  • cols (NDArray[np.intp]) – Target node indices for each edge in the MST with shape (n_samples - 1,)

Return type:

tuple[numpy.typing.NDArray[numpy.intp], numpy.typing.NDArray[numpy.intp]]

Notes

The MST is represented as two arrays (rows, cols) defining edges. Together they form n_samples - 1 edges connecting all points.

Examples

>>> import numpy as np
>>> from dataeval.core._mst import minimum_spanning_tree
>>> data = np.random.rand(100, 10)
>>> rows, cols = minimum_spanning_tree(data, k=15)
>>> len(rows)  # Should be n_samples - 1
99

See also

minimum_spanning_tree_edges

Lower-level function that returns edge weights

compute_neighbor_distances

Computes the k-NN graph