Suffix Array Doubling Sort
Build a suffix array using the prefix doubling technique with iterative ranking.
Suffix Array Doubling Sort
Suffix array doubling sort constructs a suffix array by iteratively sorting prefixes of length $1, 2, 4, 8, \dots$ until all suffixes are uniquely ranked.
It replaces full suffix comparisons with rank comparisons that double in length each round.
Problem
Given a string $T$ of length $n$, construct a suffix array:
$$ SA[0 \dots n-1] $$
such that suffixes of $T$ are sorted lexicographically.
Idea
Each suffix is represented by a pair of ranks:
$$ (rank[i], rank[i + k]) $$
where $k$ doubles each iteration:
$$ k = 1, 2, 4, 8, \dots $$
Initially, ranks are based on single characters.
Algorithm
Step 1: Initialization
Assign rank based on characters.
Step 2: Iterative sorting
Sort suffixes using rank pairs.
suffix_array_doubling(T):
n = length(T)
SA = [0..n-1]
rank = initial_character_ranks(T)
k = 1
while k < n:
sort SA by (rank[i], rank[i + k])
new_rank = array of size n
new_rank[SA[0]] = 0
for i from 1 to n-1:
prev = SA[i-1]
curr = SA[i]
if (rank[curr], rank[curr+k]) == (rank[prev], rank[prev+k]):
new_rank[curr] = new_rank[prev]
else:
new_rank[curr] = new_rank[prev] + 1
rank = new_rank
k = k * 2
return SA
Example
Text:
$$ T = "banana" $$
Initial ranks (single characters):
| index | char | rank |
|---|---|---|
| 0 | b | 1 |
| 1 | a | 0 |
| 2 | n | 13 |
| 3 | a | 0 |
| 4 | n | 13 |
| 5 | a | 0 |
First iteration ($k = 1$):
Sort by pairs:
| suffix | pair |
|---|---|
| a | (0,13) |
| a | (0,0) |
| a | (0,-) |
| banana | (1,0) |
| na | (13,0) |
| nana | (13,0) |
After repeated doubling, final suffix array becomes:
| SA index | suffix |
|---|---|
| 0 | a |
| 1 | ana |
| 2 | anana |
| 3 | banana |
| 4 | na |
| 5 | nana |
Correctness
At iteration $k$, each suffix is identified by the first $k$ characters via its rank pair:
$$ (rank[i], rank[i+k]) $$
If two suffixes have identical rank pairs, their first $2k$ characters are identical. Otherwise, lexicographic ordering is preserved by comparing rank pairs.
Since $k$ doubles each iteration, eventually $k \ge n$, and all suffixes are uniquely ranked, giving a correct lexicographic ordering.
Thus the final array is a valid suffix array.
Complexity
Let $n = |T|$.
| operation | time |
|---|---|
| each sorting step | $O(n \log n)$ |
| number of steps | $O(\log n)$ |
| total naive implementation | $O(n \log^2 n)$ |
With radix or counting sort optimization:
$$ O(n \log n) $$
Space complexity:
$$ O(n) $$
When to Use
Suffix array doubling is useful when:
- implementing suffix arrays from scratch
- no advanced linear time construction is required
- clarity and simplicity are preferred over optimal asymptotic performance
- text indexing systems are being prototyped
It is often the baseline method before moving to SA-IS or DC3 algorithms.
Implementation
def suffix_array_doubling(s):
n = len(s)
sa = list(range(n))
rank = [ord(c) for c in s]
k = 1
while k < n:
sa.sort(key=lambda i: (rank[i], rank[i+k] if i+k < n else -1))
new_rank = [0] * n
new_rank[sa[0]] = 0
for i in range(1, n):
prev, curr = sa[i-1], sa[i]
prev_key = (rank[prev], rank[prev+k] if prev+k < n else -1)
curr_key = (rank[curr], rank[curr+k] if curr+k < n else -1)
if curr_key == prev_key:
new_rank[curr] = new_rank[prev]
else:
new_rank[curr] = new_rank[prev] + 1
rank = new_rank
k *= 2
return sa
import "sort"
func SuffixArrayDoubling(s string) []int {
n := len(s)
sa := make([]int, n)
rank := make([]int, n)
for i := 0; i < n; i++ {
sa[i] = i
rank[i] = int(s[i])
}
k := 1
getRank := func(i int) int {
if i < n {
return rank[i]
}
return -1
}
for k < n {
sort.Slice(sa, func(i, j int) bool {
a, b := sa[i], sa[j]
if rank[a] != rank[b] {
return rank[a] < rank[b]
}
return getRank(a+k) < getRank(b+k)
})
newRank := make([]int, n)
newRank[sa[0]] = 0
for i := 1; i < n; i++ {
prev, curr := sa[i-1], sa[i]
prevKey := [2]int{rank[prev], getRank(prev + k)}
currKey := [2]int{rank[curr], getRank(curr + k)}
if currKey == prevKey {
newRank[curr] = newRank[prev]
} else {
newRank[curr] = newRank[prev] + 1
}
}
rank = newRank
k *= 2
}
return sa
}