Integer Sample Sort

Sample sort specialized for integer keys using range-based partitioning and optional radix-style bucket classification.

Integer Sample Sort

Integer sample sort is a specialization of sample sort for integer keys. It selects sample elements to estimate the distribution, chooses splitters, and partitions the array into buckets. Each bucket is then sorted recursively or with a suitable subroutine.

Because keys are integers, classification can use arithmetic or bit operations, and buckets can be balanced more precisely than in general comparison sorting.

Problem

Given an array $A$ of $n$ integers, sort the array efficiently using sampling to guide partitioning.

Idea

  1. Select a random or stratified sample from $A$
  2. Sort the sample
  3. Choose splitters from the sample
  4. Partition elements into buckets using splitters
  5. Recursively sort each bucket

For integers, the partition step can be optimized with branchless comparisons or radix-like classification.

Splitters

If $k$ buckets are used, choose $k - 1$ splitters:

$$ s_1 < s_2 < \cdots < s_{k-1} $$

Buckets are defined as:

$$ B_0 = {x < s_1},\quad B_i = {s_i \le x < s_{i+1}},\quad B_{k-1} = {x \ge s_{k-1}} $$

Algorithm

integer_sample_sort(A):
    if len(A) <= threshold:
        insertion_sort(A)
        return A

    sample = pick_sample(A)
    sort(sample)

    splitters = choose_splitters(sample)

    buckets = [empty list for _ in range(k)]

    for x in A:
        i = bucket_index(x, splitters)
        buckets[i].append(x)

    result = []
    for b in buckets:
        result.extend(integer_sample_sort(b))

    return result

Example

Sort:

$$ A = [91, 12, 43, 27, 65, 8, 50, 34] $$

Sample:

$$ [12, 43, 65] $$

Splitters:

$$ [30, 60] $$

Buckets:

bucket range values
0 $x < 30$ 12, 27, 8
1 $30 \le x < 60$ 43, 50, 34
2 $x \ge 60$ 91, 65

Recursively sort each bucket and concatenate:

$$ [8, 12, 27, 34, 43, 50, 65, 91] $$

Correctness

Splitters partition the domain into ordered intervals. Every element belongs to exactly one interval. All elements in earlier buckets are less than those in later buckets. Recursive sorting ensures order within each bucket, so concatenation yields a sorted array.

Complexity

Let:

  • $n$ be the number of elements
  • $k$ be the number of buckets

Expected time:

$$ O(n \log n) $$

With well balanced buckets, depth is $O(\log_k n)$.

Worst case:

$$ O(n^2) $$

if partitioning is highly unbalanced.

Space:

$$ O(n) $$

for bucket storage.

Properties

property value
stable no
in place no
comparison based yes
adaptive yes
parallel friendly yes

When to Use

Use integer sample sort when:

  • sorting large integer arrays
  • parallelization is desired
  • distribution is unknown but sampling can estimate it
  • radix sort is not ideal due to key structure or constraints

Avoid when:

  • small input sizes
  • strict worst case guarantees are required
  • memory usage must be minimal

Notes

  • Often combined with block partitioning for performance
  • Can switch to radix or insertion sort for small buckets
  • Splitter quality strongly affects performance