Heap Select
Select the k-th smallest or largest element by maintaining a heap of candidate values.
Heap Select
Heap Select finds an order statistic by using a heap to keep only the part of the input that can still contain the answer.
For the k-th smallest element, it is common to keep a max heap of size $k + 1$. The heap stores the smallest values seen so far. Whenever a new value is smaller than the heap maximum, the maximum is removed and the new value is inserted. After the scan finishes, the heap maximum is the k-th smallest value.
Problem
Given an array $A$ of length $n$ and an integer $k$ with $0 \le k < n$, return the k-th smallest element.
Algorithm
Maintain a max heap of size at most $k + 1$.
heap_select(A, k):
H = empty max heap
for x in A:
push x into H
if size(H) > k + 1:
pop maximum from H
return maximum(H)
This keeps the smallest $k + 1$ elements seen so far. The largest among them has rank $k$.
Example
Let
$$ A = [7, 2, 9, 4, 3] $$
and
$$ k = 2 $$
The three smallest values are:
$$ [2, 3, 4] $$
The largest of these is $4$, so the answer is:
$$ 4 $$
Correctness
After processing each prefix of the array, the heap contains the smallest $k + 1$ elements from that prefix, or all elements if fewer than $k + 1$ have been seen. This invariant holds because every time the heap grows too large, the largest candidate is removed.
At the end, the heap contains the smallest $k + 1$ elements of the whole array. The maximum of these elements is exactly the k-th smallest element.
Complexity
| operation | cost |
|---|---|
| heap size | $k + 1$ |
| each insertion | $O(\log k)$ |
| total time | $O(n \log k)$ |
| space | $O(k)$ |
For selecting the k-th largest, use a min heap of size $k + 1$ instead.
When to Use
Use Heap Select when:
- $k$ is much smaller than $n$
- the input is streamed
- you need top-k values, not just one element
- predictable memory usage matters
Quickselect is usually faster for in-memory selection when only one rank is needed.
Implementation
Python has a min heap, so this implementation stores negative values to simulate a max heap.
import heapq
def heap_select(a, k):
heap = []
for x in a:
heapq.heappush(heap, -x)
if len(heap) > k + 1:
heapq.heappop(heap)
return -heap[0]
package main
import "container/heap"
type MaxHeap []int
func (h MaxHeap) Len() int {
return len(h)
}
func (h MaxHeap) Less(i, j int) bool {
return h[i] > h[j]
}
func (h MaxHeap) Swap(i, j int) {
h[i], h[j] = h[j], h[i]
}
func (h *MaxHeap) Push(x any) {
*h = append(*h, x.(int))
}
func (h *MaxHeap) Pop() any {
old := *h
n := len(old)
x := old[n-1]
*h = old[:n-1]
return x
}
func HeapSelect(a []int, k int) int {
h := &MaxHeap{}
heap.Init(h)
for _, x := range a {
heap.Push(h, x)
if h.Len() > k+1 {
heap.Pop(h)
}
}
return (*h)[0]
}