Skip to content

BallTree query match time is O(n) not O(log(n)) #19066

@simsim314

Description

@simsim314

I've run performance analysis on matching NN with BallTree (same with KDTree), and the matching time is linear to number of elements, and should be O(log(n)).

Here are the result of benchmark:
num_elements, match_time
10000 0.09097146987915039
20000 0.18194293975830078
40000 0.3668830394744873
80000 0.7527577877044678

Here is my code:


from sklearn.neighbors import BallTree
import numpy as np 
import time 


def tree_perf(tree_size):
    X = np.random.rand(tree_size, 512)
    Y1 = np.random.rand(1, 512)
    Y2 = np.random.rand(10, 512)

    ts = time.time()
    kdt = BallTree(X, leaf_size=30, metric='euclidean')
    load_tree = time.time() - ts
    num_nn = 1
    ts = time.time()
    vs = kdt.query(Y1, k=num_nn, return_distance=True)
    match1 = time.time() - ts
    ts = time.time()
    vs = kdt.query(Y2, k=num_nn, return_distance=True)
    match10 = time.time() - ts
    print(tree_size, load_tree, match1, match10)

print("num_elements", "load_tree", "match_1", "match_10")

for i in range(100):
    tree_size = 10000 + i * 10000
    tree_perf(tree_size)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions