Suffix Array Doubling Sort

Construct a suffix array by iteratively sorting suffixes using doubling technique with rank pairs.

Suffix Array Doubling Sort

Suffix array doubling sort builds a suffix array by repeatedly sorting suffixes based on prefixes of length (2^k). Each step uses previously computed ranks to compare longer prefixes efficiently.

This method avoids direct string comparison and reduces the problem to sorting integer pairs.

Problem

Given a string (S) of length (n), construct the suffix array:

$$ SA = \text{sorted indices of all suffixes of } S $$

Each suffix is:

$$ S[i:] = S[i] S[i+1] \cdots S[n-1] $$

Idea

Instead of comparing suffixes directly, assign each suffix a rank.

At step (k), sort suffixes using the pair:

$$ (\text{rank}[i], \text{rank}[i + 2^k]) $$

This represents the first (2^{k+1}) characters.

After sorting, assign new ranks. Repeat until all suffixes have unique ranks.

Algorithm

suffix_array_doubling(S):
    n = length(S)
    SA = [0, 1, ..., n-1]
    rank = initial ranks from characters
    temp = array of size n

    k = 0
    while (1 << k) < n:
        sort SA by (rank[i], rank[i + 2^k])

        temp[SA[0]] = 0
        for i from 1 to n-1:
            prev = SA[i-1]
            curr = SA[i]

            if (rank[prev], rank[prev + 2^k]) < (rank[curr], rank[curr + 2^k]):
                temp[curr] = temp[prev] + 1
            else:
                temp[curr] = temp[prev]

        rank = temp
        k += 1

    return SA

Example

Let:

$$ S = "banana" $$

Suffixes:

index suffix
0 banana
1 anana
2 nana
3 ana
4 na
5 a

Sorted suffixes:

$$ ["a", "ana", "anana", "banana", "na", "nana"] $$

Suffix array:

$$ [5, 3, 1, 0, 4, 2] $$

Correctness

At step (k), suffixes are correctly ordered by their first (2^k) characters. Sorting by rank pairs extends this to (2^{k+1}) characters. Since ranks uniquely represent prefix order, the process converges when all suffixes have distinct ranks.

By induction, the final order corresponds to full lexicographic ordering of suffixes.

Complexity

Let:

  • (n) be the string length

Each iteration sorts (n) elements.

Using comparison sort:

$$ O(n \log n) \text{ per iteration} $$

Number of iterations:

$$ O(\log n) $$

Total:

$$ O(n \log^2 n) $$

With counting or radix sort on ranks:

$$ O(n \log n) $$

Space:

$$ O(n) $$

Properties

property value
stable depends on sorting method
in place no
comparison based partly
domain strings
output suffix array

When to Use

Use suffix array doubling when:

  • constructing suffix arrays for moderate sized strings
  • needing a conceptually simple and implementable method
  • memory usage should remain linear

Avoid when:

  • linear time algorithms such as SA-IS are required
  • input size is extremely large and constant factors matter

Implementation (Python)

def suffix_array_doubling(s):
    n = len(s)
    sa = list(range(n))
    rank = [ord(c) for c in s]
    tmp = [0] * n

    k = 1
    while k < n:
        sa.sort(key=lambda i: (rank[i], rank[i + k] if i + k < n else -1))

        tmp[sa[0]] = 0
        for i in range(1, n):
            prev = sa[i - 1]
            curr = sa[i]

            prev_key = (rank[prev], rank[prev + k] if prev + k < n else -1)
            curr_key = (rank[curr], rank[curr + k] if curr + k < n else -1)

            if prev_key < curr_key:
                tmp[curr] = tmp[prev] + 1
            else:
                tmp[curr] = tmp[prev]

        rank = tmp[:]
        k <<= 1

    return sa