A clear explanation of finding the lexicographically smallest subsequence with all distinct characters using a greedy stack approach.
Problem Restatement
We are given a string s.
We need to return the lexicographically smallest subsequence of s that contains all distinct characters of s exactly once.
The official constraints state that 1 <= s.length <= 1000 and s consists of lowercase English letters.
Note: This problem is identical to LeetCode 316 (Remove Duplicate Letters).
Input and Output
| Item | Meaning |
|---|---|
| Input | String s |
| Output | Lexicographically smallest subsequence with each character appearing exactly once |
Function shape:
def smallestSubsequence(s: str) -> str:
...Examples
Example 1:
s = "bcabc"Answer:
"abc"Example 2:
s = "cbacdcbc"Answer:
"acdb"Key Insight
Use a monotonic stack. For each character:
- If it is already in the stack, skip it.
- Otherwise, pop characters from the top of the stack that are greater than the current character and appear later in the string (so they can be added again).
- Push the current character.
Algorithm
- Count last occurrence of each character.
- Maintain a stack and an
in_stackboolean set. - For each character:
- Skip if already in stack.
- Pop from stack while top > current char and top’s last occurrence is later.
- Push current char.
Correctness
By popping larger characters that will appear again, we defer them to a later (possibly better) position. The resulting stack is always lexicographically smallest while including all characters exactly once.
Edge Cases
- The stack should store exactly the unresolved items needed by the invariant.
- Process equal values deliberately; many monotonic-stack problems differ only in
<versus<=. - Flush or inspect the remaining stack after the scan if unresolved items still contribute to the answer.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each character is pushed and popped at most once |
| Space | O(1) | At most 26 characters in stack |
Common Pitfalls
- Do not optimize away the invariant; the code should still make it clear what condition is being maintained.
- Prefer problem-specific names over one-letter variables once the logic becomes stateful.
Implementation
class Solution:
def smallestSubsequence(self, s: str) -> str:
last = {c: i for i, c in enumerate(s)}
stack = []
in_stack = set()
for i, c in enumerate(s):
if c in in_stack:
continue
while stack and stack[-1] > c and last[stack[-1]] > i:
in_stack.discard(stack.pop())
stack.append(c)
in_stack.add(c)
return "".join(stack)Testing
def run_tests():
s = Solution()
assert s.smallestSubsequence("bcabc") == "abc"
assert s.smallestSubsequence("cbacdcbc") == "acdb"
assert s.smallestSubsequence("a") == "a"
assert s.smallestSubsequence("abcd") == "abcd"
print("all tests passed")
run_tests()| Test | Expected | Why |
|---|---|---|
"bcabc" | "abc" | All three chars, lexicographic minimum |
"cbacdcbc" | "acdb" | Complex greedy choices |
"a" | "a" | Single character |
| Already distinct | "abcd" | Already optimal |