Bucket Sort

Bucket sort partitions elements into a set of buckets based on their value, then sorts each bucket separately and concatenates the results.

It is effective when input values are uniformly distributed over a known range.

Problem

Given an array $A$ of $n$ elements drawn from a range $[a, b)$, sort the elements.

Idea

Divide the range into $k$ equal sized intervals
Place each element into its corresponding bucket
Sort each bucket
Concatenate buckets in order

Each bucket holds elements that fall within a subrange.

Algorithm

bucket_sort(A, k):
    buckets = [empty list for _ in range(k)]

    for x in A:
        i = bucket_index(x, k)
        buckets[i].append(x)

    for i from 0 to k - 1:
        sort(buckets[i])

    return concatenate(buckets)

A common mapping for normalized values $x \in [0,1)$ is:

$$ i = \lfloor k \cdot x \rfloor $$

Example

Sort:

$$ A = [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68] $$

Using $k = 5$ buckets:

bucket	range	elements
0	[0.0, 0.2)	0.17, 0.12
1	[0.2, 0.4)	0.39, 0.26, 0.21, 0.23
2	[0.4, 0.6)	—
3	[0.6, 0.8)	0.78, 0.72, 0.68
4	[0.8, 1.0)	0.94

Sort each bucket:

$$ [0.12, 0.17],\ [0.21, 0.23, 0.26, 0.39],\ [],\ [0.68, 0.72, 0.78],\ [0.94] $$

Concatenate:

$$ [0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94] $$

Correctness

Elements in earlier buckets are smaller than elements in later buckets by construction. Sorting each bucket ensures internal order. Concatenation yields a globally sorted array.

Complexity

Let:

$n$ be number of elements
$k$ be number of buckets

Expected time:

$$ O(n + k) $$

if elements distribute evenly.

Worst case:

$$ O(n^2) $$

when all elements fall into one bucket and the inner sort dominates.

Space:

$$ O(n + k) $$

Properties

property	value
stable	depends on inner sort
in place	no
comparison based	partly
adaptive	yes, with distribution

When to Use

Use bucket sort when:

input is uniformly distributed
range is known
approximate distribution is predictable
expected linear time is desired

Avoid when data is highly skewed or clustered.

Implementation (Python)

def bucket_sort(a, k=10):
    buckets = [[] for _ in range(k)]

    for x in a:
        i = int(k * x)
        if i == k:
            i = k - 1
        buckets[i].append(x)

    result = []
    for b in buckets:
        result.extend(sorted(b))

    return result