Suffix Tree Search

Search for a pattern in a text using a suffix tree in time proportional to the pattern length.

Suffix Tree Search

Suffix tree search finds whether a pattern occurs in a text by traversing a compressed trie of all suffixes of the text.

A suffix tree stores every suffix of a string. Each edge represents a substring of the original text. This allows pattern matching in time proportional to the pattern length, independent of the text size.

Problem

Given a suffix tree built from a text $T$ and a pattern string $P$, determine whether $P$ occurs in $T$.

Return true if $P$ is a substring of $T$, otherwise false.

Structure

A suffix tree is a compressed trie of all suffixes of $T$.

Each edge stores:

field meaning
start start index in text
end end index in text
child next node

Leaves represent suffixes of $T$.

Algorithm

Start at the root and match the pattern against edge labels. Each edge label corresponds to a substring of the text.

suffix_tree_search(root, T, P):
    node = root
    i = 0

    while i < length(P):
        c = P[i]

        if c not in node.edges:
            return false

        edge = node.edges[c]
        j = edge.start

        while j <= edge.end and i < length(P):
            if T[j] != P[i]:
                return false
            j = j + 1
            i = i + 1

        node = edge.child

    return true

Example

Text:

$$ T = "banana" $$

Suffixes:

suffix
banana
anana
nana
ana
na
a

Search for pattern "ana":

step pattern part action
1 ana match edge starting with a
2 na continue along edge
end - pattern consumed

Since the pattern is fully matched, it exists in the text.

Correctness

The suffix tree contains all suffixes of the text. Therefore every substring of the text corresponds to a prefix of some suffix.

Searching for a pattern is equivalent to checking whether the pattern matches a path from the root. If the traversal consumes all characters of the pattern, then the pattern is a prefix of some suffix, hence a substring of the text.

If at any point no matching edge exists or characters mismatch, then no suffix has that prefix, so the pattern does not occur.

Complexity

Let $m = |P|$.

operation time
search $O(m)$

The algorithm compares each character of the pattern at most once.

Space complexity:

Suffix tree size is:

$$ O(n) $$

where $n = |T|$.

Search uses:

$$ O(1) $$

additional space.

When to Use

Suffix tree search is useful when:

  • many substring queries are performed on the same text
  • preprocessing cost is acceptable
  • linear time pattern matching is required

It is commonly used in string matching, bioinformatics, and text indexing.

Implementation

class Edge:
    def __init__(self, start, end, child):
        self.start = start
        self.end = end
        self.child = child

class Node:
    def __init__(self):
        self.edges = {}

def suffix_tree_search(root, text, pattern):
    node = root
    i = 0

    while i < len(pattern):
        c = pattern[i]

        if c not in node.edges:
            return False

        edge = node.edges[c]
        j = edge.start

        while j <= edge.end and i < len(pattern):
            if text[j] != pattern[i]:
                return False
            j += 1
            i += 1

        node = edge.child

    return True
type Edge struct {
	Start int
	End   int
	Child *Node
}

type Node struct {
	Edges map[rune]*Edge
}

func SuffixTreeSearch(root *Node, text string, pattern string) bool {
	node := root
	i := 0

	for i < len(pattern) {
		c := rune(pattern[i])

		edge, ok := node.Edges[c]
		if !ok {
			return false
		}

		j := edge.Start

		for j <= edge.End && i < len(pattern) {
			if text[j] != pattern[i] {
				return false
			}
			j++
			i++
		}

		node = edge.Child
	}

	return true
}