Heap Select

Select the k-th smallest or largest element by maintaining a heap of candidate values.

Heap Select

Heap Select finds an order statistic by using a heap to keep only the part of the input that can still contain the answer.

For the k-th smallest element, it is common to keep a max heap of size $k + 1$. The heap stores the smallest values seen so far. Whenever a new value is smaller than the heap maximum, the maximum is removed and the new value is inserted. After the scan finishes, the heap maximum is the k-th smallest value.

Problem

Given an array $A$ of length $n$ and an integer $k$ with $0 \le k < n$, return the k-th smallest element.

Algorithm

Maintain a max heap of size at most $k + 1$.

heap_select(A, k):
    H = empty max heap

    for x in A:
        push x into H

        if size(H) > k + 1:
            pop maximum from H

    return maximum(H)

This keeps the smallest $k + 1$ elements seen so far. The largest among them has rank $k$.

Example

Let

$$ A = [7, 2, 9, 4, 3] $$

and

$$ k = 2 $$

The three smallest values are:

$$ [2, 3, 4] $$

The largest of these is $4$, so the answer is:

$$ 4 $$

Correctness

After processing each prefix of the array, the heap contains the smallest $k + 1$ elements from that prefix, or all elements if fewer than $k + 1$ have been seen. This invariant holds because every time the heap grows too large, the largest candidate is removed.

At the end, the heap contains the smallest $k + 1$ elements of the whole array. The maximum of these elements is exactly the k-th smallest element.

Complexity

operation cost
heap size $k + 1$
each insertion $O(\log k)$
total time $O(n \log k)$
space $O(k)$

For selecting the k-th largest, use a min heap of size $k + 1$ instead.

When to Use

Use Heap Select when:

  • $k$ is much smaller than $n$
  • the input is streamed
  • you need top-k values, not just one element
  • predictable memory usage matters

Quickselect is usually faster for in-memory selection when only one rank is needed.

Implementation

Python has a min heap, so this implementation stores negative values to simulate a max heap.

import heapq

def heap_select(a, k):
    heap = []

    for x in a:
        heapq.heappush(heap, -x)

        if len(heap) > k + 1:
            heapq.heappop(heap)

    return -heap[0]
package main

import "container/heap"

type MaxHeap []int

func (h MaxHeap) Len() int {
	return len(h)
}

func (h MaxHeap) Less(i, j int) bool {
	return h[i] > h[j]
}

func (h MaxHeap) Swap(i, j int) {
	h[i], h[j] = h[j], h[i]
}

func (h *MaxHeap) Push(x any) {
	*h = append(*h, x.(int))
}

func (h *MaxHeap) Pop() any {
	old := *h
	n := len(old)
	x := old[n-1]
	*h = old[:n-1]
	return x
}

func HeapSelect(a []int, k int) int {
	h := &MaxHeap{}
	heap.Init(h)

	for _, x := range a {
		heap.Push(h, x)

		if h.Len() > k+1 {
			heap.Pop(h)
		}
	}

	return (*h)[0]
}