A clear explanation of finding the lexicographically smallest equivalent string using Union-Find with canonical representatives.
Problem Restatement
We are given two strings s1 and s2 of the same length, and a string baseStr.
From s1 and s2, we define equivalence classes: s1[i] and s2[i] are equivalent for each i.
The equivalence relation is reflexive, symmetric, and transitive.
We need to return the lexicographically smallest equivalent string of baseStr using the equivalence classes.
The official constraints state that 1 <= s1.length == s2.length <= 1000.
Input and Output
| Item | Meaning |
|---|---|
| Input | Strings s1, s2, baseStr |
| Output | Lexicographically smallest equivalent of baseStr |
Function shape:
def smallestEquivalentString(s1: str, s2: str, baseStr: str) -> str:
...Examples
Example 1:
s1 = "parker", s2 = "morris", baseStr = "parser"Equivalences: p=m, a=o, r=r, k=r, e=i, r=s.
Transitive: k, r, s are all equivalent. Smallest is k.
Answer:
"makkek"Key Insight
Use Union-Find where the canonical representative of each group is the lexicographically smallest character.
For each pair (s1[i], s2[i]), union the two characters.
When finding the representative, always prefer the smaller character.
Algorithm
- Initialize
parent[c] = cfor all 26 letters. - For each pair
(a, b)froms1ands2, unionaandb, keeping the smaller as root. - For each character in
baseStr, replace it with its root (smallest equivalent).
Correctness
Union-Find with path compression and rank correctly finds all transitive equivalences. By always choosing the smaller character as the root, find(c) returns the lexicographically smallest equivalent of c.
Edge Cases
- Check the minimum input size allowed by the constraints.
- Verify duplicate values or tie cases if the input can contain them.
- Keep the return value aligned with the exact failure case in the statement.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O((n + m) * α) | Union-Find operations are near-constant |
| Space | O(1) | Fixed-size array for 26 letters |
Common Pitfalls
- Do not optimize away the invariant; the code should still make it clear what condition is being maintained.
- Prefer problem-specific names over one-letter variables once the logic becomes stateful.
Implementation
class Solution:
def smallestEquivalentString(self, s1: str, s2: str, baseStr: str) -> str:
parent = list(range(26))
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
def union(a, b):
ra, rb = find(a), find(b)
if ra == rb:
return
if ra < rb:
parent[rb] = ra
else:
parent[ra] = rb
for a, b in zip(s1, s2):
union(ord(a) - 97, ord(b) - 97)
return "".join(chr(find(ord(c) - 97) + 97) for c in baseStr)Testing
def run_tests():
s = Solution()
assert s.smallestEquivalentString("parker", "morris", "parser") == "makkek"
assert s.smallestEquivalentString("hello", "world", "hold") == "hdld"
assert s.smallestEquivalentString("leetcode", "programs", "sourcecode") == "aauaaaaada"
print("all tests passed")
run_tests()| Test | Expected | Why |
|---|---|---|
"parker","morris" | "makkek" | Equivalence classes resolved |
"hello","world" | "hdld" | Each char replaced by smallest equivalent |