LeetCode 1044: Longest Duplicate Substring

Problem Restatement

We are given a string s.

We need to return the longest substring that appears at least twice.

If no such substring exists, return an empty string.

The official constraints state that 2 <= s.length <= 3 * 10^4 and s consists only of lowercase English letters.

Input and Output

Item	Meaning
Input	String `s`
Output	Longest duplicate substring (any valid answer accepted)

Function shape:

def longestDupSubstring(s: str) -> str:
    ...

Examples

Example 1:

s = "banana"

"ana" appears at indices 1 and 3.

Answer:

"ana"

Example 2:

s = "abcd"

No duplicate substring of length ≥ 1.

Answer:

""

Key Insight

Use binary search on the length of the duplicate substring.

For a fixed length L, use Rabin-Karp rolling hash to check if any substring of length L appears at least twice.

Since “if length L has a duplicate then any shorter length also has a duplicate”, the problem is monotone and binary search applies.

Algorithm

Binary search on length L from 0 to n-1.
For each L, compute rolling hashes of all substrings of length L.
If any hash appears twice, record the substring as a candidate.
Return the longest found candidate.

Correctness

Rolling hash detects duplicate substrings in O(n) per check (with small probability of false positives, but we verify matches). Binary search reduces the search space logarithmically.

Edge Cases

Check the smallest and largest valid search values; off-by-one errors usually appear at the boundaries.
Decide whether the predicate is looking for the first true value, the last true value, or an exact match.
When the target is absent, the loop should still terminate and return the problem-specific failure value.

Complexity

Metric	Value	Why
Time	`O(n log n)`	Binary search × rolling hash check
Space	`O(n)`	Hash set storage

Common Pitfalls

Do not mix inclusive and half-open bounds inside the same loop.
Make sure each branch strictly reduces the search interval.

Implementation

class Solution:
    def longestDupSubstring(self, s: str) -> str:
        n = len(s)
        nums = [ord(c) - ord('a') for c in s]
        MOD = (1 << 61) - 1
        BASE = 31

        def check(L):
            if L == 0:
                return ""
            h = 0
            power = pow(BASE, L, MOD)
            for i in range(L):
                h = (h * BASE + nums[i]) % MOD
            seen = {h: [0]}
            for i in range(1, n - L + 1):
                h = (h * BASE - nums[i - 1] * power + nums[i + L - 1]) % MOD
                if h in seen:
                    for j in seen[h]:
                        if s[i:i+L] == s[j:j+L]:
                            return s[i:i+L]
                    seen[h].append(i)
                else:
                    seen[h] = [i]
            return ""

        lo, hi = 0, n - 1
        result = ""
        while lo <= hi:
            mid = (lo + hi) // 2
            found = check(mid)
            if found:
                result = found
                lo = mid + 1
            else:
                hi = mid - 1

        return result

Testing

def run_tests():
    s = Solution()

    assert s.longestDupSubstring("banana") == "ana"
    assert s.longestDupSubstring("abcd") == ""
    assert s.longestDupSubstring("aa") == "a"

    print("all tests passed")

run_tests()

Test	Expected	Why
`"banana"`	`"ana"`	`"ana"` appears at two positions
`"abcd"`	`""`	No duplicates
`"aa"`	`"a"`	Single char duplicate