dataeval.core.compute_neighbors¶

dataeval.core.compute_neighbors(data_fit, data_query=None, k=1, algorithm='auto')¶

For each sample in data_query, compute the k nearest neighbors in data_fit.

Parameters:¶

data_fit : ArrayND[float]¶: Reference points to search with shape (n_samples_fit, n_features). Can be an N dimensional list, or array-like object. This is the dataset that will be indexed for neighbor search.
data_query : ArrayND[float]¶: Query points with shape (n_samples_query, n_features). Can be an N dimensional list, or array-like object. For each of these points, find k nearest neighbors in data_fit.
k : int, default=1¶: The number of neighbors to find
algorithm : {"auto", "ball_tree", "kd_tree"}, default="auto"¶: Tree method for nearest neighbor computation

Returns:¶

Indices of k nearest neighbors in data_fit for each point in data_query. Shape is (n_samples_query,) if k=1, otherwise (n_samples_query, k)

Return type:¶

NDArray[np.int64]

Raises:¶

ValueError – If k < 1 or if algorithm is not “auto”, “ball_tree”, or “kd_tree”

See also

sklearn.neighbors.NearestNeighbors: Similar sklearn interface
compute_neighbor_distances: For self-query (single dataset)

Notes

Do not use kd_tree if n_features > 20

Examples

>>> import numpy as np
>>> from dataeval.core import compute_neighbors
>>> reference_data = np.random.rand(100, 5)  # 100 reference points
>>> query_data = np.random.rand(10, 5)  # 10 query points

>>> neighbors = compute_neighbors(reference_data, query_data, k=3)
>>> neighbors.shape
(10, 3)