# LeetCode 1044: Longest Duplicate Substring

## Problem Restatement

We are given a string `s`.

We need to return the longest substring that appears at least twice.

If no such substring exists, return an empty string.

The official constraints state that `2 <= s.length <= 3 * 10^4` and `s` consists only of lowercase English letters.

## Input and Output

| Item | Meaning |
|---|---|
| Input | String `s` |
| Output | Longest duplicate substring (any valid answer accepted) |

Function shape:

```python
def longestDupSubstring(s: str) -> str:
    ...
```

## Examples

Example 1:

```python
s = "banana"
```

`"ana"` appears at indices 1 and 3.

Answer:

```python
"ana"
```

Example 2:

```python
s = "abcd"
```

No duplicate substring of length ≥ 1.

Answer:

```python
""
```

## Key Insight

Use binary search on the length of the duplicate substring.

For a fixed length `L`, use Rabin-Karp rolling hash to check if any substring of length `L` appears at least twice.

Since "if length `L` has a duplicate then any shorter length also has a duplicate", the problem is monotone and binary search applies.

## Algorithm

1. Binary search on length `L` from `0` to `n-1`.
2. For each `L`, compute rolling hashes of all substrings of length `L`.
3. If any hash appears twice, record the substring as a candidate.
4. Return the longest found candidate.

## Correctness

Rolling hash detects duplicate substrings in O(n) per check (with small probability of false positives, but we verify matches). Binary search reduces the search space logarithmically.

## Edge Cases

- Check the smallest and largest valid search values; off-by-one errors usually appear at the boundaries.
- Decide whether the predicate is looking for the first true value, the last true value, or an exact match.
- When the target is absent, the loop should still terminate and return the problem-specific failure value.

## Complexity

| Metric | Value | Why |
|---|---:|---|
| Time | `O(n log n)` | Binary search × rolling hash check |
| Space | `O(n)` | Hash set storage |

## Common Pitfalls

- Do not mix inclusive and half-open bounds inside the same loop.
- Make sure each branch strictly reduces the search interval.

## Implementation

```python
class Solution:
    def longestDupSubstring(self, s: str) -> str:
        n = len(s)
        nums = [ord(c) - ord('a') for c in s]
        MOD = (1 << 61) - 1
        BASE = 31

        def check(L):
            if L == 0:
                return ""
            h = 0
            power = pow(BASE, L, MOD)
            for i in range(L):
                h = (h * BASE + nums[i]) % MOD
            seen = {h: [0]}
            for i in range(1, n - L + 1):
                h = (h * BASE - nums[i - 1] * power + nums[i + L - 1]) % MOD
                if h in seen:
                    for j in seen[h]:
                        if s[i:i+L] == s[j:j+L]:
                            return s[i:i+L]
                    seen[h].append(i)
                else:
                    seen[h] = [i]
            return ""

        lo, hi = 0, n - 1
        result = ""
        while lo <= hi:
            mid = (lo + hi) // 2
            found = check(mid)
            if found:
                result = found
                lo = mid + 1
            else:
                hi = mid - 1

        return result
```

## Testing

```python
def run_tests():
    s = Solution()

    assert s.longestDupSubstring("banana") == "ana"
    assert s.longestDupSubstring("abcd") == ""
    assert s.longestDupSubstring("aa") == "a"

    print("all tests passed")

run_tests()
```

| Test | Expected | Why |
|---|---|---|
| `"banana"` | `"ana"` | `"ana"` appears at two positions |
| `"abcd"` | `""` | No duplicates |
| `"aa"` | `"a"` | Single char duplicate |

