Sparse Key Counting Sort

Sparse key counting sort handles large key ranges by storing counts only for keys that appear in the input. It replaces the dense count array of counting sort with a map from key to frequency.

This is useful when the key universe is huge but the number of distinct keys is small.

Problem

Given an array $A$ of $n$ keys drawn from a large domain, sort $A$ when only $u$ distinct keys appear and $u \ll n$ or $u \ll k$.

Idea

Use a dictionary to count frequencies:

$$ H[x] = \text{number of occurrences of } x $$

Then sort the distinct keys and emit each key according to its count.

Algorithm

sparse_key_counting_sort(A):
    H = empty map

    for x in A:
        H[x] += 1

    keys = sorted keys of H

    i = 0
    for key in keys:
        repeat H[key] times:
            A[i] = key
            i += 1

    return A

Example

Let:

$$ A = [1000000, 5, 1000000, 42, 5] $$

Frequency map:

key	count
5	2
42	1
1000000	2

Sorted keys:

$$ [5, 42, 1000000] $$

Output:

$$ [5, 5, 42, 1000000, 1000000] $$

Correctness

The frequency map records the exact multiplicity of each key. Sorting the distinct keys gives the correct key order. Emitting each key exactly its recorded number of times produces a sorted array containing exactly the original multiset.

Complexity

Let:

$n$ be the number of elements
$u$ be the number of distinct keys

Counting time:

$$ O(n) $$

Sorting distinct keys:

$$ O(u \log u) $$

Total time:

$$ O(n + u \log u) $$

Space:

$$ O(u) $$

Comparison with Dense Counting Sort

method	best domain	time	space
dense counting sort	small contiguous range	$O(n + k)$	$O(k)$
sparse key counting sort	large sparse range	$O(n + u \log u)$	$O(u)$

Sparse key counting sort is better when $k$ is large and $u$ is small.

Properties

property	value
stable	no
in place	mostly, excluding map
comparison based	only among distinct keys
handles sparse keys	yes

When to Use

Use sparse key counting sort when:

the key range is large
few distinct keys appear
counting frequencies is natural
dense counting arrays would waste memory

Avoid it when all or most keys are distinct, since sorting distinct keys gives the same asymptotic cost as ordinary comparison sort.

Implementation

from collections import Counter

def sparse_key_counting_sort(a):
    count = Counter(a)
    i = 0

    for key in sorted(count):
        for _ in range(count[key]):
            a[i] = key
            i += 1

    return a