Bucket Sort

Distribute elements into buckets based on value ranges and sort each bucket independently.

Bucket Sort

Bucket sort partitions elements into a set of buckets based on their value, then sorts each bucket separately and concatenates the results.

It is effective when input values are uniformly distributed over a known range.

Problem

Given an array $A$ of $n$ elements drawn from a range $[a, b)$, sort the elements.

Idea

  1. Divide the range into $k$ equal sized intervals
  2. Place each element into its corresponding bucket
  3. Sort each bucket
  4. Concatenate buckets in order

Each bucket holds elements that fall within a subrange.

Algorithm

bucket_sort(A, k):
    buckets = [empty list for _ in range(k)]

    for x in A:
        i = bucket_index(x, k)
        buckets[i].append(x)

    for i from 0 to k - 1:
        sort(buckets[i])

    return concatenate(buckets)

A common mapping for normalized values $x \in [0,1)$ is:

$$ i = \lfloor k \cdot x \rfloor $$

Example

Sort:

$$ A = [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68] $$

Using $k = 5$ buckets:

bucket range elements
0 [0.0, 0.2) 0.17, 0.12
1 [0.2, 0.4) 0.39, 0.26, 0.21, 0.23
2 [0.4, 0.6)
3 [0.6, 0.8) 0.78, 0.72, 0.68
4 [0.8, 1.0) 0.94

Sort each bucket:

$$ [0.12, 0.17],\ [0.21, 0.23, 0.26, 0.39],\ [],\ [0.68, 0.72, 0.78],\ [0.94] $$

Concatenate:

$$ [0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94] $$

Correctness

Elements in earlier buckets are smaller than elements in later buckets by construction. Sorting each bucket ensures internal order. Concatenation yields a globally sorted array.

Complexity

Let:

  • $n$ be number of elements
  • $k$ be number of buckets

Expected time:

$$ O(n + k) $$

if elements distribute evenly.

Worst case:

$$ O(n^2) $$

when all elements fall into one bucket and the inner sort dominates.

Space:

$$ O(n + k) $$

Properties

property value
stable depends on inner sort
in place no
comparison based partly
adaptive yes, with distribution

When to Use

Use bucket sort when:

  • input is uniformly distributed
  • range is known
  • approximate distribution is predictable
  • expected linear time is desired

Avoid when data is highly skewed or clustered.

Implementation (Python)

def bucket_sort(a, k=10):
    buckets = [[] for _ in range(k)]

    for x in a:
        i = int(k * x)
        if i == k:
            i = k - 1
        buckets[i].append(x)

    result = []
    for b in buckets:
        result.extend(sorted(b))

    return result