A clear explanation of efficiently querying a stream of characters against a word list using an Aho-Corasick trie.
Problem Restatement
We implement a StreamChecker class:
StreamChecker(words): Initialize with a list of words.query(letter): Add the letter to the stream. Returntrueif any word inwordsis a suffix of the letters queried so far.
The official constraints state that 1 <= words.length <= 2000, 1 <= words[i].length <= 2000, and up to 4 * 10^4 calls to query.
Input and Output
| Item | Meaning |
|---|---|
| Input | Word list and stream of characters |
| Output | true if any word is a suffix of the current stream |
Examples
sc = StreamChecker(["cd", "f", "kl"])
sc.query('a') # False
sc.query('b') # False
sc.query('c') # False
sc.query('d') # True — "cd" is a suffix
sc.query('e') # False
sc.query('f') # True — "f" is a suffix
sc.query('g') # False
sc.query('h') # False
sc.query('i') # False
sc.query('j') # False
sc.query('k') # False
sc.query('l') # True — "kl" is a suffixKey Insight
Insert all words reversed into a trie. Then maintain a set of active trie nodes.
For each query character:
- Add the root (to start matching any word that begins with this character).
- For each active node, follow the edge for the current character.
- If any active node is a word-end, return
True.
Because words are inserted reversed and we advance from the root on each new character, we are effectively checking if the stream ends with any word.
Algorithm
- Build a trie of reversed words.
- Maintain
active_nodesas a set of trie nodes currently being tracked. - For each
query(letter):- Add
roottoactive_nodes. - Advance each node by
letter. - If any resulting node marks an end-of-word, return
True.
- Add
Correctness
Inserting reversed words means a path from the root spells out a word backwards.
By starting from the root at each query step and advancing, we track all suffixes of the stream simultaneously.
If any active node reaches an end-of-word node, the current stream has that word as a suffix.
Edge Cases
- Check the minimum input size allowed by the constraints.
- Verify duplicate values or tie cases if the input can contain them.
- Keep the return value aligned with the exact failure case in the statement.
Complexity
Let W = total length of all words, Q = number of queries, A = alphabet size.
| Metric | Value | Why |
|---|---|---|
| Build | O(W) | Insert all reversed words |
| Query | `O( | active |
| Space | O(W) | Trie storage |
Common Pitfalls
- Do not optimize away the invariant; the code should still make it clear what condition is being maintained.
- Prefer problem-specific names over one-letter variables once the logic becomes stateful.
Implementation
class StreamChecker:
def __init__(self, words: list[str]):
self.trie = {}
self.active = []
for word in words:
node = self.trie
for ch in reversed(word):
node = node.setdefault(ch, {})
node['$'] = True
self.stream = []
def query(self, letter: str) -> bool:
self.stream.append(letter)
node = self.trie
for ch in reversed(self.stream):
if ch not in node:
break
node = node[ch]
if '$' in node:
return True
return FalseCode Explanation
Build the trie with reversed words:
for ch in reversed(word):
node = node.setdefault(ch, {})
node['$'] = TrueFor each query, scan the stream in reverse to check suffixes:
for ch in reversed(self.stream):
if ch not in node:
break
node = node[ch]
if '$' in node:
return TrueSince words are at most 2000 characters, scanning the reversed stream is bounded.
Testing
def run_tests():
sc = StreamChecker(["cd", "f", "kl"])
results = [sc.query(c) for c in "abcdefghijkl"]
assert results[3] == True # 'd' — "cd" matches
assert results[5] == True # 'f' — "f" matches
assert results[11] == True # 'l' — "kl" matches
assert results[0] == False
print("all tests passed")
run_tests()| Query | Expected | Why |
|---|---|---|
'd' | True | "cd" is suffix of stream "abcd" |
'f' | True | "f" is suffix of stream "abcdef" |
'l' | True | "kl" is suffix of stream "abcdefghijkl" |