Suffix Tree Search
Search for a pattern in a text using a suffix tree in time proportional to the pattern length.
Suffix Tree Search
Suffix tree search finds whether a pattern occurs in a text by traversing a compressed trie of all suffixes of the text.
A suffix tree stores every suffix of a string. Each edge represents a substring of the original text. This allows pattern matching in time proportional to the pattern length, independent of the text size.
Problem
Given a suffix tree built from a text $T$ and a pattern string $P$, determine whether $P$ occurs in $T$.
Return true if $P$ is a substring of $T$, otherwise false.
Structure
A suffix tree is a compressed trie of all suffixes of $T$.
Each edge stores:
| field | meaning |
|---|---|
start |
start index in text |
end |
end index in text |
child |
next node |
Leaves represent suffixes of $T$.
Algorithm
Start at the root and match the pattern against edge labels. Each edge label corresponds to a substring of the text.
suffix_tree_search(root, T, P):
node = root
i = 0
while i < length(P):
c = P[i]
if c not in node.edges:
return false
edge = node.edges[c]
j = edge.start
while j <= edge.end and i < length(P):
if T[j] != P[i]:
return false
j = j + 1
i = i + 1
node = edge.child
return true
Example
Text:
$$ T = "banana" $$
Suffixes:
| suffix |
|---|
| banana |
| anana |
| nana |
| ana |
| na |
| a |
Search for pattern "ana":
| step | pattern part | action |
|---|---|---|
| 1 | ana | match edge starting with a |
| 2 | na | continue along edge |
| end | - | pattern consumed |
Since the pattern is fully matched, it exists in the text.
Correctness
The suffix tree contains all suffixes of the text. Therefore every substring of the text corresponds to a prefix of some suffix.
Searching for a pattern is equivalent to checking whether the pattern matches a path from the root. If the traversal consumes all characters of the pattern, then the pattern is a prefix of some suffix, hence a substring of the text.
If at any point no matching edge exists or characters mismatch, then no suffix has that prefix, so the pattern does not occur.
Complexity
Let $m = |P|$.
| operation | time |
|---|---|
| search | $O(m)$ |
The algorithm compares each character of the pattern at most once.
Space complexity:
Suffix tree size is:
$$ O(n) $$
where $n = |T|$.
Search uses:
$$ O(1) $$
additional space.
When to Use
Suffix tree search is useful when:
- many substring queries are performed on the same text
- preprocessing cost is acceptable
- linear time pattern matching is required
It is commonly used in string matching, bioinformatics, and text indexing.
Implementation
class Edge:
def __init__(self, start, end, child):
self.start = start
self.end = end
self.child = child
class Node:
def __init__(self):
self.edges = {}
def suffix_tree_search(root, text, pattern):
node = root
i = 0
while i < len(pattern):
c = pattern[i]
if c not in node.edges:
return False
edge = node.edges[c]
j = edge.start
while j <= edge.end and i < len(pattern):
if text[j] != pattern[i]:
return False
j += 1
i += 1
node = edge.child
return True
type Edge struct {
Start int
End int
Child *Node
}
type Node struct {
Edges map[rune]*Edge
}
func SuffixTreeSearch(root *Node, text string, pattern string) bool {
node := root
i := 0
for i < len(pattern) {
c := rune(pattern[i])
edge, ok := node.Edges[c]
if !ok {
return false
}
j := edge.Start
for j <= edge.End && i < len(pattern) {
if text[j] != pattern[i] {
return false
}
j++
i++
}
node = edge.Child
}
return true
}