Multikey Quicksort

String sorting algorithm that partitions strings by the character at the current depth using three-way quicksort.

Multikey Quicksort

Multikey quicksort is a string sorting algorithm that combines quicksort partitioning with radix-style character inspection. Instead of comparing whole strings each time, it partitions strings by the character at a current depth.

It is especially effective for strings with shared prefixes because it avoids repeatedly comparing characters that are already known to be equal.

Problem

Given an array $A$ of strings, sort the strings in lexicographic order.

For a string $s$, let:

$$ s[d] $$

be the character at position $d$. If $d$ reaches the end of the string, use a sentinel value smaller than every real character.

Idea

At depth $d$, choose a pivot character from one string:

$$ v = s[d] $$

Partition the array into three parts:

region condition
left character at $d$ is less than $v$
middle character at $d$ equals $v$
right character at $d$ is greater than $v$

Then:

  • sort the left region at the same depth
  • sort the middle region at depth $d + 1$
  • sort the right region at the same depth

Algorithm

multikey_quicksort(A, lo, hi, d):
    if hi - lo <= 1:
        return

    pivot = char_at(A[lo], d)

    lt = lo
    i = lo + 1
    gt = hi - 1

    while i <= gt:
        c = char_at(A[i], d)

        if c < pivot:
            swap A[lt], A[i]
            lt += 1
            i += 1
        else if c > pivot:
            swap A[i], A[gt]
            gt -= 1
        else:
            i += 1

    multikey_quicksort(A, lo, lt, d)

    if pivot is not sentinel:
        multikey_quicksort(A, lt, gt + 1, d + 1)

    multikey_quicksort(A, gt + 1, hi, d)

Example

Sort:

$$ ["car", "cat", "dog", "cart"] $$

At depth 0, pivot character is c.

region strings
equal to c car, cat, cart
greater than c dog

The middle region is recursively sorted at depth 1. Later, at depth 2, car and cart share the same prefix. The sentinel places car before cart.

Final result:

$$ ["car", "cart", "cat", "dog"] $$

Correctness

At each recursive call, the partition step separates strings by their character at depth $d$. Strings in the left region are lexicographically smaller than strings in the middle and right regions at the first differing character. Strings in the right region are lexicographically larger.

Strings in the middle region share the same character at depth $d$, so their order depends on later characters. Recursing on depth $d + 1$ correctly sorts that region. By induction over subarray size and depth, the full array becomes lexicographically sorted.

Complexity

Let:

  • $n$ be the number of strings
  • $L$ be the average string length

Expected time:

$$ O(nL) $$

Worst case:

$$ O(n^2 + nL) $$

with poor pivot choices.

Auxiliary space:

$$ O(\log n + L) $$

expected recursion stack, depending on partition balance.

Properties

property value
stable no
in place yes
comparison based hybrid
direction left to right
best for string keys

When to Use

Use multikey quicksort when sorting strings with common prefixes and when in-place operation is useful. It is a strong general-purpose string sort because it combines radix awareness with quicksort-style partitioning.

Avoid it when stability is required or when adversarial inputs can cause poor pivot behavior unless randomized or median-based pivots are used.

Implementation

def multikey_quicksort(a):
    def char_at(s, d):
        if d == len(s):
            return -1
        return ord(s[d])

    def sort(lo, hi, d):
        if hi - lo <= 1:
            return

        pivot = char_at(a[lo], d)
        lt = lo
        i = lo + 1
        gt = hi - 1

        while i <= gt:
            c = char_at(a[i], d)

            if c < pivot:
                a[lt], a[i] = a[i], a[lt]
                lt += 1
                i += 1
            elif c > pivot:
                a[i], a[gt] = a[gt], a[i]
                gt -= 1
            else:
                i += 1

        sort(lo, lt, d)

        if pivot >= 0:
            sort(lt, gt + 1, d + 1)

        sort(gt + 1, hi, d)

    sort(0, len(a), 0)
    return a