Sparse Key Counting Sort

Counting sort variant that stores counts only for keys that appear, avoiding dense arrays over large key ranges.

Sparse Key Counting Sort

Sparse key counting sort handles large key ranges by storing counts only for keys that appear in the input. It replaces the dense count array of counting sort with a map from key to frequency.

This is useful when the key universe is huge but the number of distinct keys is small.

Problem

Given an array $A$ of $n$ keys drawn from a large domain, sort $A$ when only $u$ distinct keys appear and $u \ll n$ or $u \ll k$.

Idea

Use a dictionary to count frequencies:

$$ H[x] = \text{number of occurrences of } x $$

Then sort the distinct keys and emit each key according to its count.

Algorithm

sparse_key_counting_sort(A):
    H = empty map

    for x in A:
        H[x] += 1

    keys = sorted keys of H

    i = 0
    for key in keys:
        repeat H[key] times:
            A[i] = key
            i += 1

    return A

Example

Let:

$$ A = [1000000, 5, 1000000, 42, 5] $$

Frequency map:

key count
5 2
42 1
1000000 2

Sorted keys:

$$ [5, 42, 1000000] $$

Output:

$$ [5, 5, 42, 1000000, 1000000] $$

Correctness

The frequency map records the exact multiplicity of each key. Sorting the distinct keys gives the correct key order. Emitting each key exactly its recorded number of times produces a sorted array containing exactly the original multiset.

Complexity

Let:

  • $n$ be the number of elements
  • $u$ be the number of distinct keys

Counting time:

$$ O(n) $$

Sorting distinct keys:

$$ O(u \log u) $$

Total time:

$$ O(n + u \log u) $$

Space:

$$ O(u) $$

Comparison with Dense Counting Sort

method best domain time space
dense counting sort small contiguous range $O(n + k)$ $O(k)$
sparse key counting sort large sparse range $O(n + u \log u)$ $O(u)$

Sparse key counting sort is better when $k$ is large and $u$ is small.

Properties

property value
stable no
in place mostly, excluding map
comparison based only among distinct keys
handles sparse keys yes

When to Use

Use sparse key counting sort when:

  • the key range is large
  • few distinct keys appear
  • counting frequencies is natural
  • dense counting arrays would waste memory

Avoid it when all or most keys are distinct, since sorting distinct keys gives the same asymptotic cost as ordinary comparison sort.

Implementation

from collections import Counter

def sparse_key_counting_sort(a):
    count = Counter(a)
    i = 0

    for key in sorted(count):
        for _ in range(count[key]):
            a[i] = key
            i += 1

    return a