Three Way String Quicksort

String sorting algorithm that uses three-way partitioning on characters to efficiently handle repeated prefixes.

Three Way String Quicksort

Three way string quicksort is a refinement of multikey quicksort that uses three-way partitioning to efficiently handle repeated characters. It splits the array into less than, equal to, and greater than partitions based on the character at a given position.

It is one of the most practical general-purpose string sorting algorithms.

Problem

Given an array $A$ of strings, sort them in lexicographic order.

For string $s$, define:

$$ s[d] $$

as the character at position $d$. If $d$ exceeds the string length, use a sentinel value smaller than all characters.

Idea

At depth $d$:

  • pick pivot character $v = A[lo][d]$
  • partition into three regions:
region condition
left character < $v$
middle character = $v$
right character > $v$

Then recursively:

  • sort left region at same depth
  • sort middle region at depth $d + 1$
  • sort right region at same depth

This reduces redundant comparisons when many strings share prefixes.

Algorithm

three_way_string_quicksort(A, lo, hi, d):
    if hi - lo <= 1:
        return

    pivot = char_at(A[lo], d)

    lt = lo
    i = lo + 1
    gt = hi - 1

    while i <= gt:
        c = char_at(A[i], d)

        if c < pivot:
            swap A[lt], A[i]
            lt += 1
            i += 1
        else if c > pivot:
            swap A[i], A[gt]
            gt -= 1
        else:
            i += 1

    three_way_string_quicksort(A, lo, lt, d)

    if pivot is not sentinel:
        three_way_string_quicksort(A, lt, gt + 1, d + 1)

    three_way_string_quicksort(A, gt + 1, hi, d)

Example

Sort:

$$ ["she", "sells", "sea", "shells", "by", "the"] $$

At depth 0:

region strings
b by
s she, sells, sea, shells
t the

The s group is recursively sorted at depth 1. Shared prefixes like sh and se are handled efficiently.

Final result:

$$ ["by", "sea", "sells", "she", "shells", "the"] $$

Correctness

At each recursion level, partitioning ensures that all strings in the left region are lexicographically smaller than those in the middle and right regions at the current character position.

Strings in the middle region share the same character, so their order depends on the next position. Recursive sorting at depth $d + 1$ resolves this. By induction on subarray size and depth, the algorithm produces a lexicographically sorted array.

Complexity

Let:

  • $n$ be the number of strings
  • $L$ be the average string length

Expected time:

$$ O(nL) $$

Worst case:

$$ O(n^2) $$

for poor pivot choices.

Space:

$$ O(\log n + L) $$

due to recursion.

Properties

property value
stable no
in place yes
comparison based hybrid
efficient for prefixes yes

When to Use

Use three way string quicksort when:

  • sorting strings with many shared prefixes
  • memory usage must remain low
  • performance should be close to linear

Avoid when:

  • stability is required
  • worst case guarantees are critical
  • input is adversarial without randomization

Implementation

def three_way_string_quicksort(a):
    def char_at(s, d):
        if d == len(s):
            return -1
        return ord(s[d])

    def sort(lo, hi, d):
        if hi - lo <= 1:
            return

        pivot = char_at(a[lo], d)
        lt, i, gt = lo, lo + 1, hi - 1

        while i <= gt:
            c = char_at(a[i], d)

            if c < pivot:
                a[lt], a[i] = a[i], a[lt]
                lt += 1
                i += 1
            elif c > pivot:
                a[i], a[gt] = a[gt], a[i]
                gt -= 1
            else:
                i += 1

        sort(lo, lt, d)

        if pivot >= 0:
            sort(lt, gt + 1, d + 1)

        sort(gt + 1, hi, d)

    sort(0, len(a), 0)
    return a