Array Deduplication

Remove duplicate values from an array while keeping one representative of each value.

Array Deduplication

Array deduplication removes repeated values from an array. The usual method keeps a set of values already seen and writes each new value once.

You use it when the array may contain repeated elements and downstream work needs only distinct values.

Problem

Given an array $A$ of length $n$, produce an array containing each distinct value once.

For stable deduplication, preserve the first occurrence order.

Algorithm

Use a set and a write pointer.

deduplicate(A):
    seen = empty set
    k = 0

    for i from 0 to length(A) - 1:
        if A[i] not in seen:
            add A[i] to seen
            A[k] = A[i]
            k += 1

    return k

Example

Let

$$ A = [8, 3, 5, 3, 8, 1] $$

step value seen after step array prefix
1 8 {8} [8]
2 3 {8, 3} [8, 3]
3 5 {8, 3, 5} [8, 3, 5]
4 3 {8, 3, 5} [8, 3, 5]
5 8 {8, 3, 5} [8, 3, 5]
6 1 {8, 3, 5, 1} [8, 3, 5, 1]

Return new length $4$.

The deduplicated prefix is:

$$ [8, 3, 5, 1] $$

Correctness

The set seen contains exactly the values already written into the output prefix. When the scan reaches a new value, the algorithm writes it once and records it. When the scan reaches a value already in seen, the value has already been written, so skipping it preserves uniqueness.

Since the scan proceeds left to right, the first occurrence order is preserved.

Complexity

operation expected time
deduplicate $O(n)$

Space usage:

$$ O(m) $$

where $m$ is the number of distinct values.

When to Use

Array deduplication is appropriate when:

  • repeated values should be removed
  • first occurrence order matters
  • extra memory for a set is acceptable
  • expected linear time is preferred

It is less suitable when:

  • values cannot be hashed
  • strict constant extra space is required

For sorted arrays, deduplication can be done in-place with constant extra space.

Implementation

def deduplicate(a):
    seen = set()
    k = 0

    for x in a:
        if x not in seen:
            seen.add(x)
            a[k] = x
            k += 1

    return k
func Deduplicate[T comparable](a []T) int {
    seen := make(map[T]struct{})
    k := 0

    for _, x := range a {
        if _, ok := seen[x]; ok {
            continue
        }

        seen[x] = struct{}{}
        a[k] = x
        k++
    }

    return k
}