Skew Suffix Array Sort (DC3)

Linear time suffix array construction using the DC3 divide and conquer strategy.

Skew Suffix Array Sort (DC3)

DC3, also called skew algorithm, builds a suffix array in linear time by splitting suffix positions into residue classes modulo 3 and recursively sorting a subset.

It reduces the full suffix ordering problem into a smaller problem, then merges results using rank comparisons.

Problem

Given a string $T$ of length $n$, construct its suffix array:

$$ SA[0 \dots n-1] $$

such that all suffixes are sorted lexicographically.

Key Idea

Split suffixes into three groups:

group positions
$S_0$ i mod 3 = 0
$S_1$ i mod 3 = 1
$S_2$ i mod 3 = 2

Sort $S_1 \cup S_2$ recursively, then induce order for $S_0$.

Algorithm

Step 1: Form triplets

For each position $i$:

$$ (T[i], T[i+1], T[i+2]) $$

Step 2: Sort triplets

Sort all positions where $i \mod 3 \neq 0$.

Step 3: Assign ranks

Assign lexicographic ranks to triplets.

Step 4: Recursive step

If ranks are not unique, recurse on reduced string.

Step 5: Sort $S_0$

Use sorted $S_1$ and $S_2$ to sort $S_0$ using rank comparisons.

Step 6: Merge

Merge sorted $S_0$ with sorted $S_1 \cup S_2$.

dc3(T):
    split positions into S1 and S2

    sort S1 ∪ S2 by triplets

    assign ranks

    if ranks not unique:
        recurse

    sort S0 using ranks

    merge S0 and S1∪S2

    return SA

Example

Text:

$$ T = "banana" $$

Positions:

i char
0 b
1 a
2 n
3 a
4 n
5 a

Groups:

group positions
S0 0, 3
S1 1, 4
S2 2, 5

After sorting and ranking, DC3 merges all suffixes into:

SA index suffix
0 a
1 ana
2 anana
3 banana
4 na
5 nana

Correctness

DC3 works by reducing suffix comparison to fixed length tuples and ensuring lexicographic order through recursive ranking.

Suffixes in $S_1 \cup S_2$ are sorted recursively, giving stable ranks. These ranks act as compressed keys for comparing suffixes in $S_0$.

The final merge step preserves ordering because each suffix comparison is reduced to rank comparison or direct character comparison in a bounded window.

Thus the output is a correct suffix array.

Complexity

Let $n = |T|$.

phase cost
triplet formation $O(n)$
recursive sorting $O(n)$
merging $O(n)$

Total time:

$$ O(n) $$

Space complexity:

$$ O(n) $$

When to Use

DC3 is useful when:

  • linear time suffix array construction is required
  • SA-IS is not preferred due to implementation complexity
  • theoretical guarantees are important
  • working with academic or algorithmic research systems

In practice, SA-IS is often faster and simpler, but DC3 remains a classic linear construction method.

Implementation

def dc3(T):
    # Simplified structure of DC3 algorithm

    def sort_triplets(T, positions):
        pass

    def assign_ranks(sorted_positions):
        pass

    def merge(sa0, sa12):
        pass

    S1S2 = [i for i in range(len(T)) if i % 3 != 0]
    SA12 = sort_triplets(T, S1S2)
    ranks = assign_ranks(SA12)

    if len(set(ranks)) != len(ranks):
        SA12 = dc3(ranks)

    SA0 = []

    SA = merge(SA0, SA12)

    return SA
func DC3(text []int) []int {
	// Simplified skeleton

	sortTriplets := func() []int { return nil }
	assignRanks := func([]int) []int { return nil }
	merge := func([]int, []int) []int { return nil }

	s1s2 := []int{}

	for i := 0; i < len(text); i++ {
		if i%3 != 0 {
			s1s2 = append(s1s2, i)
		}
	}

	sa12 := sortTriplets()
	ranks := assignRanks(sa12)

	_ = ranks

	sa0 := []int{}

	sa := merge(sa0, sa12)

	return sa
}