dataeval.core.minimum_spanning_tree

dataeval.core.minimum_spanning_tree(embeddings, k=15)

Compute the minimum spanning tree of a dataset.

This is a high-level interface that computes k-nearest neighbors and then constructs the minimum spanning tree from the resulting graph.

Parameters:
embeddings : Array2D[float]

Input data with shape (n_samples, n_features). Can be a 2D list, array-like object, or tensor that will be flattened if necessary.

k : int, default=15

Number of nearest neighbors to use for building the k-NN graph. Higher values increase connectivity but add computational cost. Should be large enough to ensure graph connectivity.

Returns:

Mapping with keys: - source : NDArray[np.int64] - Source node indices for each edge in the MST with shape (n_samples - 1,) - target : NDArray[np.int64] - Target node indices for each edge in the MST with shape (n_samples - 1,)

Return type:

MSTResult

Notes

The MST is represented as two arrays (source, target) defining edges. Together they form n_samples - 1 edges connecting all points.

Examples

>>> import numpy as np
>>> from dataeval.core import minimum_spanning_tree
>>> data = np.random.rand(100, 10)
>>> mst = minimum_spanning_tree(data, k=15)
>>> len(mst["source"])  # Should be n_samples - 1
99

See also

minimum_spanning_tree_edges

Lower-level function that returns edge weights

compute_neighbor_distances

Computes the k-NN graph