Bottom K Selection
Select the k smallest elements using bounded structures or partitioning.
Bottom K Selection
Bottom K Selection returns the $k$ smallest elements of an array. It is symmetric to Top K selection and can be implemented with heaps or partition based methods.
The result may be unordered unless explicitly sorted.
Problem
Given an array $A$ of length $n$ and an integer $k$, return the $k$ smallest elements.
Algorithm
Heap Based
Maintain a max heap of size $k$.
bottom_k_heap(A, k):
H = empty max heap
for x in A:
if size(H) < k:
push x into H
else if x < maximum(H):
pop maximum from H
push x into H
return elements of H
Partition Based
Use selection to isolate the smallest region.
bottom_k_partition(A, k):
nth_element(A, k)
return A[0..k-1]
Example
Let
$$ A = [7, 2, 9, 4, 3] $$
and
$$ k = 3 $$
The smallest elements are:
$$ [2, 3, 4] $$
Partition result may look like:
$$ [2, 3, 4, 9, 7] $$
Return the first $k$ elements.
Correctness
For the heap method, the structure always contains the smallest $k$ elements seen so far. Larger elements are discarded.
For the partition method, Nth Element ensures that all elements before index $k$ are less than or equal to $A[k]$, so they form the correct subset.
Complexity
| method | time |
|---|---|
| heap | $O(n \log k)$ |
| partition average | $O(n)$ |
| partition worst | $O(n^2)$ |
Sorting the result adds:
$$ O(k \log k) $$
Space:
- heap: $O(k)$
- partition: $O(1)$
When to Use
Use Bottom K Selection when:
- you need smallest values
- $k \ll n$
- streaming or memory limits apply
Partition based methods are faster for in-memory data, while heap based methods are more flexible for streaming.
Implementation
import heapq
def bottom_k_heap(a, k):
if k <= 0:
return []
heap = []
for x in a:
if len(heap) < k:
heapq.heappush(heap, -x)
elif x < -heap[0]:
heapq.heapreplace(heap, -x)
return [-x for x in heap]
import random
def bottom_k_partition(a, k):
def partition(left, right, pivot_index):
pivot = a[pivot_index]
a[pivot_index], a[right] = a[right], a[pivot_index]
store = left
for i in range(left, right):
if a[i] < pivot:
a[store], a[i] = a[i], a[store]
store += 1
a[right], a[store] = a[store], a[right]
return store
def nth_element(left, right, k):
while True:
if left == right:
return
pivot_index = random.randint(left, right)
pivot_index = partition(left, right, pivot_index)
if pivot_index == k:
return
elif k < pivot_index:
right = pivot_index - 1
else:
left = pivot_index + 1
nth_element(0, len(a) - 1, k)
return a[:k]