dataeval.core.compute_neighbors

dataeval.core.compute_neighbors(data_fit, data_query=None, k=1, algorithm='auto')

For each sample in data_query, compute the k nearest neighbors in data_fit.

Parameters:
data_fit : ArrayND[float]

Reference points to search with shape (n_samples_fit, n_features). Can be an N dimensional list, or array-like object. This is the dataset that will be indexed for neighbor search.

data_query : ArrayND[float]

Query points with shape (n_samples_query, n_features). Can be an N dimensional list, or array-like object. For each of these points, find k nearest neighbors in data_fit.

k : int, default=1

The number of neighbors to find

algorithm : {"auto", "ball_tree", "kd_tree"}, default="auto"

Tree method for nearest neighbor computation

Returns:

Indices of k nearest neighbors in data_fit for each point in data_query. Shape is (n_samples_query,) if k=1, otherwise (n_samples_query, k)

Return type:

NDArray[np.int64]

Raises:

ValueError – If k < 1 or if algorithm is not “auto”, “ball_tree”, or “kd_tree”

See also

sklearn.neighbors.NearestNeighbors

Similar sklearn interface

compute_neighbor_distances

For self-query (single dataset)

Notes

Do not use kd_tree if n_features > 20

Examples

>>> import numpy as np
>>> from dataeval.core import compute_neighbors
>>> reference_data = np.random.rand(100, 5)  # 100 reference points
>>> query_data = np.random.rand(10, 5)  # 10 query points
>>> neighbors = compute_neighbors(reference_data, query_data, k=3)
>>> neighbors.shape
(10, 3)