Suffix Array Doubling Sort

Build a suffix array using the prefix doubling technique with iterative ranking.

Suffix Array Doubling Sort

Suffix array doubling sort constructs a suffix array by iteratively sorting prefixes of length $1, 2, 4, 8, \dots$ until all suffixes are uniquely ranked.

It replaces full suffix comparisons with rank comparisons that double in length each round.

Problem

Given a string $T$ of length $n$, construct a suffix array:

$$ SA[0 \dots n-1] $$

such that suffixes of $T$ are sorted lexicographically.

Idea

Each suffix is represented by a pair of ranks:

$$ (rank[i], rank[i + k]) $$

where $k$ doubles each iteration:

$$ k = 1, 2, 4, 8, \dots $$

Initially, ranks are based on single characters.

Algorithm

Step 1: Initialization

Assign rank based on characters.

Step 2: Iterative sorting

Sort suffixes using rank pairs.

suffix_array_doubling(T):
    n = length(T)
    SA = [0..n-1]
    rank = initial_character_ranks(T)

    k = 1

    while k < n:
        sort SA by (rank[i], rank[i + k])

        new_rank = array of size n
        new_rank[SA[0]] = 0

        for i from 1 to n-1:
            prev = SA[i-1]
            curr = SA[i]

            if (rank[curr], rank[curr+k]) == (rank[prev], rank[prev+k]):
                new_rank[curr] = new_rank[prev]
            else:
                new_rank[curr] = new_rank[prev] + 1

        rank = new_rank
        k = k * 2

    return SA

Example

Text:

$$ T = "banana" $$

Initial ranks (single characters):

index char rank
0 b 1
1 a 0
2 n 13
3 a 0
4 n 13
5 a 0

First iteration ($k = 1$):

Sort by pairs:

suffix pair
a (0,13)
a (0,0)
a (0,-)
banana (1,0)
na (13,0)
nana (13,0)

After repeated doubling, final suffix array becomes:

SA index suffix
0 a
1 ana
2 anana
3 banana
4 na
5 nana

Correctness

At iteration $k$, each suffix is identified by the first $k$ characters via its rank pair:

$$ (rank[i], rank[i+k]) $$

If two suffixes have identical rank pairs, their first $2k$ characters are identical. Otherwise, lexicographic ordering is preserved by comparing rank pairs.

Since $k$ doubles each iteration, eventually $k \ge n$, and all suffixes are uniquely ranked, giving a correct lexicographic ordering.

Thus the final array is a valid suffix array.

Complexity

Let $n = |T|$.

operation time
each sorting step $O(n \log n)$
number of steps $O(\log n)$
total naive implementation $O(n \log^2 n)$

With radix or counting sort optimization:

$$ O(n \log n) $$

Space complexity:

$$ O(n) $$

When to Use

Suffix array doubling is useful when:

  • implementing suffix arrays from scratch
  • no advanced linear time construction is required
  • clarity and simplicity are preferred over optimal asymptotic performance
  • text indexing systems are being prototyped

It is often the baseline method before moving to SA-IS or DC3 algorithms.

Implementation

def suffix_array_doubling(s):
    n = len(s)
    sa = list(range(n))
    rank = [ord(c) for c in s]

    k = 1

    while k < n:
        sa.sort(key=lambda i: (rank[i], rank[i+k] if i+k < n else -1))

        new_rank = [0] * n
        new_rank[sa[0]] = 0

        for i in range(1, n):
            prev, curr = sa[i-1], sa[i]

            prev_key = (rank[prev], rank[prev+k] if prev+k < n else -1)
            curr_key = (rank[curr], rank[curr+k] if curr+k < n else -1)

            if curr_key == prev_key:
                new_rank[curr] = new_rank[prev]
            else:
                new_rank[curr] = new_rank[prev] + 1

        rank = new_rank
        k *= 2

    return sa
import "sort"

func SuffixArrayDoubling(s string) []int {
	n := len(s)
	sa := make([]int, n)
	rank := make([]int, n)

	for i := 0; i < n; i++ {
		sa[i] = i
		rank[i] = int(s[i])
	}

	k := 1

	getRank := func(i int) int {
		if i < n {
			return rank[i]
		}
		return -1
	}

	for k < n {
		sort.Slice(sa, func(i, j int) bool {
			a, b := sa[i], sa[j]
			if rank[a] != rank[b] {
				return rank[a] < rank[b]
			}
			return getRank(a+k) < getRank(b+k)
		})

		newRank := make([]int, n)
		newRank[sa[0]] = 0

		for i := 1; i < n; i++ {
			prev, curr := sa[i-1], sa[i]

			prevKey := [2]int{rank[prev], getRank(prev + k)}
			currKey := [2]int{rank[curr], getRank(curr + k)}

			if currKey == prevKey {
				newRank[curr] = newRank[prev]
			} else {
				newRank[curr] = newRank[prev] + 1
			}
		}

		rank = newRank
		k *= 2
	}

	return sa
}