A clear explanation of finding the shortest string containing both input strings as subsequences using LCS dynamic programming.
Problem Restatement
We are given two strings str1 and str2.
We need to return the shortest string such that both str1 and str2 are subsequences of it.
If there are multiple answers, return any.
The official constraints state that 1 <= str1.length, str2.length <= 1000.
Input and Output
| Item | Meaning |
|---|---|
| Input | Strings str1 and str2 |
| Output | Shortest common supersequence |
Function shape:
def shortestCommonSupersequence(str1: str, str2: str) -> str:
...Examples
Example 1:
str1 = "abac", str2 = "cab"SCS: "cabac". Length 5.
Answer:
"cabac"Key Insight
The shortest common supersequence has length len(str1) + len(str2) - LCS(str1, str2).
Build the LCS DP table, then reconstruct the SCS by tracing back through the table.
When characters match (from LCS), include the character once. When they don’t match, include the character from whichever string gives the longer LCS at that point, then move that pointer.
Algorithm
- Compute the LCS DP table.
- Trace back from
dp[m][n]:- If
str1[i-1] == str2[j-1]: add the character, move both pointers. - Elif
dp[i-1][j] > dp[i][j-1]: addstr1[i-1], movei. - Else: add
str2[j-1], movej.
- If
- Append remaining characters from either string.
- Reverse the result.
Edge Cases
- Initialize the base states explicitly; most wrong answers come from an implicit zero that should not be allowed.
- Confirm the iteration order matches the recurrence dependencies.
- For optimization DP, separate impossible states from valid states with value
0.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(m * n) | LCS DP + traceback |
| Space | O(m * n) | DP table |
Common Pitfalls
- Do not overwrite a state before all transitions that still need the old value have used it.
- Use the problem constraints to choose between full tables and compressed state.
Implementation
class Solution:
def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
m, n = len(str1), len(str2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if str1[i-1] == str2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
result = []
i, j = m, n
while i > 0 and j > 0:
if str1[i-1] == str2[j-1]:
result.append(str1[i-1])
i -= 1
j -= 1
elif dp[i-1][j] > dp[i][j-1]:
result.append(str1[i-1])
i -= 1
else:
result.append(str2[j-1])
j -= 1
result.extend(reversed(str1[:i]))
result.extend(reversed(str2[:j]))
return "".join(reversed(result))Testing
def run_tests():
s = Solution()
def is_subseq(t, s):
it = iter(s)
return all(c in it for c in t)
result = s.shortestCommonSupersequence("abac", "cab")
assert is_subseq("abac", result) and is_subseq("cab", result)
result = s.shortestCommonSupersequence("aaaaaaaa", "aaaaaaaa")
assert result == "aaaaaaaa"
print("all tests passed")
run_tests()| Test | Verification | Why |
|---|---|---|
"abac","cab" | Both are subsequences | SCS = "cabac" |
| Identical strings | Result equals either string | LCS = entire string |