Skew Suffix Array Sort (DC3)
Linear time suffix array construction using the DC3 divide and conquer strategy.
Skew Suffix Array Sort (DC3)
DC3, also called skew algorithm, builds a suffix array in linear time by splitting suffix positions into residue classes modulo 3 and recursively sorting a subset.
It reduces the full suffix ordering problem into a smaller problem, then merges results using rank comparisons.
Problem
Given a string $T$ of length $n$, construct its suffix array:
$$ SA[0 \dots n-1] $$
such that all suffixes are sorted lexicographically.
Key Idea
Split suffixes into three groups:
| group | positions |
|---|---|
| $S_0$ | i mod 3 = 0 |
| $S_1$ | i mod 3 = 1 |
| $S_2$ | i mod 3 = 2 |
Sort $S_1 \cup S_2$ recursively, then induce order for $S_0$.
Algorithm
Step 1: Form triplets
For each position $i$:
$$ (T[i], T[i+1], T[i+2]) $$
Step 2: Sort triplets
Sort all positions where $i \mod 3 \neq 0$.
Step 3: Assign ranks
Assign lexicographic ranks to triplets.
Step 4: Recursive step
If ranks are not unique, recurse on reduced string.
Step 5: Sort $S_0$
Use sorted $S_1$ and $S_2$ to sort $S_0$ using rank comparisons.
Step 6: Merge
Merge sorted $S_0$ with sorted $S_1 \cup S_2$.
dc3(T):
split positions into S1 and S2
sort S1 ∪ S2 by triplets
assign ranks
if ranks not unique:
recurse
sort S0 using ranks
merge S0 and S1∪S2
return SA
Example
Text:
$$ T = "banana" $$
Positions:
| i | char |
|---|---|
| 0 | b |
| 1 | a |
| 2 | n |
| 3 | a |
| 4 | n |
| 5 | a |
Groups:
| group | positions |
|---|---|
| S0 | 0, 3 |
| S1 | 1, 4 |
| S2 | 2, 5 |
After sorting and ranking, DC3 merges all suffixes into:
| SA index | suffix |
|---|---|
| 0 | a |
| 1 | ana |
| 2 | anana |
| 3 | banana |
| 4 | na |
| 5 | nana |
Correctness
DC3 works by reducing suffix comparison to fixed length tuples and ensuring lexicographic order through recursive ranking.
Suffixes in $S_1 \cup S_2$ are sorted recursively, giving stable ranks. These ranks act as compressed keys for comparing suffixes in $S_0$.
The final merge step preserves ordering because each suffix comparison is reduced to rank comparison or direct character comparison in a bounded window.
Thus the output is a correct suffix array.
Complexity
Let $n = |T|$.
| phase | cost |
|---|---|
| triplet formation | $O(n)$ |
| recursive sorting | $O(n)$ |
| merging | $O(n)$ |
Total time:
$$ O(n) $$
Space complexity:
$$ O(n) $$
When to Use
DC3 is useful when:
- linear time suffix array construction is required
- SA-IS is not preferred due to implementation complexity
- theoretical guarantees are important
- working with academic or algorithmic research systems
In practice, SA-IS is often faster and simpler, but DC3 remains a classic linear construction method.
Implementation
def dc3(T):
# Simplified structure of DC3 algorithm
def sort_triplets(T, positions):
pass
def assign_ranks(sorted_positions):
pass
def merge(sa0, sa12):
pass
S1S2 = [i for i in range(len(T)) if i % 3 != 0]
SA12 = sort_triplets(T, S1S2)
ranks = assign_ranks(SA12)
if len(set(ranks)) != len(ranks):
SA12 = dc3(ranks)
SA0 = []
SA = merge(SA0, SA12)
return SA
func DC3(text []int) []int {
// Simplified skeleton
sortTriplets := func() []int { return nil }
assignRanks := func([]int) []int { return nil }
merge := func([]int, []int) []int { return nil }
s1s2 := []int{}
for i := 0; i < len(text); i++ {
if i%3 != 0 {
s1s2 = append(s1s2, i)
}
}
sa12 := sortTriplets()
ranks := assignRanks(sa12)
_ = ranks
sa0 := []int{}
sa := merge(sa0, sa12)
return sa
}