dataeval.core.minimum_spanning_tree¶
-
dataeval.core.minimum_spanning_tree(embeddings, k=
15)¶ Compute the minimum spanning tree of a dataset.
This is a high-level interface that computes k-nearest neighbors and then constructs the minimum spanning tree from the resulting graph.
- Parameters:¶
- embeddings : Array2D[float]¶
Input data with shape (n_samples, n_features). Can be a 2D list, array-like object, or tensor that will be flattened if necessary.
- k : int, default=15¶
Number of nearest neighbors to use for building the k-NN graph. Higher values increase connectivity but add computational cost. Should be large enough to ensure graph connectivity.
- Returns:¶
Mapping with keys: - source : NDArray[np.int64] - Source node indices for each edge in the MST with shape (n_samples - 1,) - target : NDArray[np.int64] - Target node indices for each edge in the MST with shape (n_samples - 1,)
- Return type:¶
See also
minimum_spanning_tree_edgesLower-level function that returns edge weights
compute_neighbor_distancesComputes the k-NN graph
Notes
The MST is represented as two arrays (source, target) defining edges. Together they form n_samples - 1 edges connecting all points.
Examples
>>> import numpy as np >>> from dataeval.core import minimum_spanning_tree >>> data = np.random.rand(100, 10)>>> mst = minimum_spanning_tree(data, k=15) >>> len(mst["source"]) # Should be n_samples - 1 99