Rolling Hash Search

Rolling hash search computes a hash over a fixed size window and updates it efficiently as the window moves. It allows substring search in linear time with low constant factors.

You use it when scanning large text for patterns or when building string matching algorithms such as Rabin Karp.

Problem

Given a text $T$ of length $n$ and a pattern $P$ of length $m$, find all indices $i$ such that:

$$ T[i \dots i + m - 1] = P $$

Model

Define a polynomial hash for a string:

$$ H(s_0 s_1 \dots s_{m-1}) = (s_0 \cdot B^{m-1} + s_1 \cdot B^{m-2} + \dots + s_{m-1}) \bmod M $$

For a sliding window, the hash updates as:

$$ H_{i+1} = \big((H_i - s_i \cdot B^{m-1}) \cdot B + s_{i+m}\big) \bmod M $$

This avoids recomputing the hash from scratch.

Algorithm

rolling_hash_search(T, P):
    n = length(T)
    m = length(P)

    target = hash(P)
    current = hash(T[0..m-1])

    for i from 0 to n - m:
        if current == target:
            if T[i..i+m-1] == P:
                report match at i

        if i < n - m:
            current = update_hash(current, T[i], T[i+m])

    return all matches

Example

Let:

$$ T = "abracadabra" $$

$$ P = "abra" $$

Sliding windows:

index	substring	hash match
0	abra	yes
1	brac	no
2	raca	no
3	acad	no
4	cada	no
5	adab	no
6	dabr	no
7	abra	yes

Matches at indices 0 and 7.

Correctness

The rolling hash ensures that every substring of length $m$ is assigned a hash value. When the hash matches the pattern hash, a direct comparison verifies the match.

The direct comparison eliminates false positives caused by hash collisions.

Complexity

operation	time
preprocessing	$O(m)$
search	$O(n)$
verification	depends on matches

Total time is $O(n + m)$ in expectation.

Space

$$ O(1) $$

Only a few integer values are stored.

When to Use

Rolling hash search is appropriate when:

substring matching is required
repeated hash computation must be avoided
large texts are processed
approximate matching is combined with verification
multiple pattern checks are needed

It is widely used in string matching, plagiarism detection, deduplication, and substring indexing.

Implementation

def rolling_hash_search(text, pattern):
    n, m = len(text), len(pattern)
    if m > n:
        return []

    B = 256
    M = 10**9 + 7

    def hash_str(s):
        h = 0
        for c in s:
            h = (h * B + ord(c)) % M
        return h

    target = hash_str(pattern)
    current = hash_str(text[:m])

    power = pow(B, m - 1, M)
    result = []

    for i in range(n - m + 1):
        if current == target:
            if text[i:i+m] == pattern:
                result.append(i)

        if i < n - m:
            current = (current - ord(text[i]) * power) % M
            current = (current * B + ord(text[i + m])) % M

    return result

func RollingHashSearch(text, pattern string) []int {
	n, m := len(text), len(pattern)
	if m > n {
		return nil
	}

	const B = 256
	const M = 1000000007

	hashStr := func(s string) int {
		h := 0
		for i := 0; i < len(s); i++ {
			h = (h*B + int(s[i])) % M
		}
		return h
	}

	target := hashStr(pattern)
	current := hashStr(text[:m])

	power := 1
	for i := 0; i < m-1; i++ {
		power = (power * B) % M
	}

	var result []int

	for i := 0; i <= n-m; i++ {
		if current == target && text[i:i+m] == pattern {
			result = append(result, i)
		}

		if i < n-m {
			current = (current - int(text[i])*power%M + M) % M
			current = (current*B + int(text[i+m])) % M
		}
	}

	return result
}