Cover Tree Search

Nearest neighbor search in metric space using hierarchical covers with exponential scale separation.

Cover Tree Search

Cover tree search is a metric space search structure designed for nearest neighbor queries. It builds a hierarchy of points where each level covers the level below with exponentially shrinking distance bounds.

It provides strong theoretical guarantees in spaces with bounded intrinsic dimension.

Problem

Given a set of points $S$ in a metric space and a query point $q$, find:

$$ \arg\min_{p \in S} d(p, q) $$

Structure

Each level $i$ of the tree has a scale:

$$ 2^i $$

Each node satisfies:

property meaning
nesting every point appears in all lower levels
covering children are within bounded distance
separation nodes at same level are well separated

Each node stores a point and pointers to children at the next finer level.

Algorithm

Search maintains a candidate best distance and prunes using scale bounds.

cover_tree_search(node, q, best):
    if node is null:
        return best

    d = distance(q, node.point)
    best = min(best, d)

    if node.is_leaf:
        return best

    for child in node.children:
        if distance(q, child.point) <= best + scale(child.level):
            best = cover_tree_search(child, q, best)

    return best

Example

Points in 2D:

point
(1,1)
(2,2)
(4,4)
(8,8)

Hierarchy:

level scale
3 8
2 4
1 2
0 1

Query:

$$ q = (3,3) $$

Nearest neighbor:

$$ (2,2) $$

Correctness

The cover tree enforces two invariants:

  • Covering: every point at level $i$ is within distance $2^i$ of its parent
  • Separation: distinct nodes at level $i$ are at least $2^i$ apart

These properties guarantee that any point closer than the current best candidate must lie within a reachable subtree under the pruning condition.

Since all candidates that can improve the result are explored, and all pruned subtrees are provably too far to contain a closer point, the algorithm returns the true nearest neighbor.

Complexity

Let $n$ be number of points and intrinsic dimension be bounded.

case time
average nearest neighbor $O(\log n)$
worst case $O(n)$

Space complexity:

$$ O(n) $$

When to Use

Cover trees are useful when:

  • data lies in general metric spaces
  • intrinsic dimension is low or moderate
  • nearest neighbor queries dominate workload
  • triangle inequality holds but coordinates are not available

Compared to k-d trees and ball trees, cover trees provide stronger theoretical guarantees in high dimensional metric spaces.

Implementation

import math

def dist(a, b):
    return math.sqrt(sum((x - y) ** 2 for x, y in zip(a, b)))

class Node:
    def __init__(self, point, level):
        self.point = point
        self.level = level
        self.children = []

def scale(level):
    return 2 ** level

def cover_tree_search(node, q, best=float("inf")):
    if node is None:
        return best

    d = dist(q, node.point)
    if d < best:
        best = d

    for child in node.children:
        if dist(q, child.point) <= best + scale(child.level):
            best = cover_tree_search(child, q, best)

    return best
import "math"

type Point []float64

func dist(a, b Point) float64 {
	sum := 0.0
	for i := range a {
		d := a[i] - b[i]
		sum += d * d
	}
	return math.Sqrt(sum)
}

type Node struct {
	Point    Point
	Level    int
	Children []*Node
}

func scale(level int) float64 {
	return math.Pow(2, float64(level))
}

func CoverTreeSearch(node *Node, q Point, best float64) float64 {
	if node == nil {
		return best
	}

	d := dist(q, node.Point)
	if d < best {
		best = d
	}

	for _, child := range node.Children {
		if dist(q, child.Point) <= best+scale(child.Level) {
			best = CoverTreeSearch(child, q, best)
		}
	}

	return best
}