Parallel Binary Search with DSU

Answer offline connectivity queries by combining parallel binary search with a disjoint set union.

Parallel Binary Search with DSU

Parallel binary search with DSU answers many offline connectivity queries over a sequence of edge insertions. It finds the earliest time when each pair of vertices becomes connected.

The technique combines two ideas:

  • parallel binary search over answer times
  • disjoint set union for fast connectivity checks

It applies when edges are only added. Once two vertices become connected, they remain connected, so the predicate is monotone.

Problem

Given:

  • vertices $0$ to $n - 1$
  • edge insertions $E_0, E_1, \dots, E_{T-1}$
  • queries $(u, v)$

For each query, find the smallest time $t$ such that $u$ and $v$ are connected after applying edges:

$$ E_0, E_1, \dots, E_t $$

If they never become connected, return $-1$.

Key Idea

Each query has an answer interval $[l, r]$ over time.

During each round:

  1. Put each active query into a bucket by midpoint
  2. Reset the DSU
  3. Replay edge insertions from left to right
  4. At each time $t$, answer all queries whose midpoint is $t$
  5. Shrink each query interval

The DSU is rebuilt once per round, not once per query.

Algorithm

parallel_binary_search_dsu(n, edges, queries):
    T = length(edges)
    Q = length(queries)

    low[q] = 0 for every query q
    high[q] = T for every query q

    while some query has low[q] < high[q]:
        buckets = array of empty lists of length T

        for q from 0 to Q - 1:
            if low[q] < high[q]:
                mid = (low[q] + high[q]) // 2
                if mid < T:
                    buckets[mid].append(q)

        dsu.reset(n)

        for t from 0 to T - 1:
            dsu.union(edges[t].u, edges[t].v)

            for q in buckets[t]:
                u, v = queries[q]
                if dsu.find(u) == dsu.find(v):
                    high[q] = t
                else:
                    low[q] = t + 1

    answer[q] = low[q] if low[q] < T else -1
    return answer

Example

Edges are inserted over time:

time edge
0 $(0, 1)$
1 $(2, 3)$
2 $(1, 2)$
3 $(4, 5)$

Query:

$$ (0, 3) $$

Connectivity changes:

time connected?
0 no
1 no
2 yes
3 yes

The earliest connected time is $2$.

Correctness

For each query $(u, v)$, define:

$$ F(t) = \text{$u$ and $v$ are connected after edges } 0..t $$

Because edges are only added, connectivity can change only from false to true. It never changes from true back to false.

Therefore $F(t)$ is monotone.

Each query maintains an interval containing its earliest true time. If the DSU says the endpoints are connected at midpoint $m$, then the earliest true time is at most $m$, so the algorithm moves the right boundary to $m$.

If they are not connected at $m$, monotonicity implies they are not connected at any earlier time, so the algorithm moves the left boundary to $m + 1$.

When the interval collapses, the remaining time is the earliest connection time, or $T$ if no such time exists.

Complexity

Let:

  • $n$ be the number of vertices
  • $T$ be the number of edge insertions
  • $Q$ be the number of queries
  • $\alpha(n)$ be the inverse Ackermann factor from DSU

Each round replays all edges and checks all active queries.

component cost
rounds $O(\log T)$
DSU unions $O(T \alpha(n) \log T)$
query checks $O(Q \alpha(n) \log T)$
total $O((T + Q)\alpha(n)\log T)$
space $O(n + T + Q)$

DSU Operations

make_set(v):
    parent[v] = v
    size[v] = 1

find(v):
    while parent[v] != v:
        parent[v] = parent[parent[v]]
        v = parent[v]
    return v

union(a, b):
    ra = find(a)
    rb = find(b)
    if ra == rb:
        return

    if size[ra] < size[rb]:
        swap(ra, rb)

    parent[rb] = ra
    size[ra] += size[rb]

When to Use

Use this method when:

  • all queries are known beforehand
  • updates are edge insertions only
  • each query asks for earliest connectivity
  • the graph grows monotonically
  • rebuilding DSU per round is acceptable

Notes

  • This method does not directly support edge deletions.
  • For deletions, use rollback DSU with divide and conquer on time.
  • If only final connectivity matters, ordinary DSU is enough.
  • If queries arrive online, this offline method is not suitable.

Implementation

class DSU:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n

    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]
            x = self.parent[x]
        return x

    def union(self, a, b):
        ra = self.find(a)
        rb = self.find(b)
        if ra == rb:
            return
        if self.size[ra] < self.size[rb]:
            ra, rb = rb, ra
        self.parent[rb] = ra
        self.size[ra] += self.size[rb]

def parallel_binary_search_dsu(n, edges, queries):
    t = len(edges)
    q = len(queries)

    low = [0] * q
    high = [t] * q

    while True:
        active = False
        buckets = [[] for _ in range(t)]

        for i in range(q):
            if low[i] < high[i]:
                active = True
                mid = (low[i] + high[i]) // 2
                if mid < t:
                    buckets[mid].append(i)

        if not active:
            break

        dsu = DSU(n)

        for time, (a, b) in enumerate(edges):
            dsu.union(a, b)

            for qi in buckets[time]:
                u, v = queries[qi]
                if dsu.find(u) == dsu.find(v):
                    high[qi] = time
                else:
                    low[qi] = time + 1

    return [low[i] if low[i] < t else -1 for i in range(q)]
type DSU struct {
    parent []int
    size   []int
}

func NewDSU(n int) *DSU {
    parent := make([]int, n)
    size := make([]int, n)
    for i := 0; i < n; i++ {
        parent[i] = i
        size[i] = 1
    }
    return &DSU{parent: parent, size: size}
}

func (d *DSU) Find(x int) int {
    for d.parent[x] != x {
        d.parent[x] = d.parent[d.parent[x]]
        x = d.parent[x]
    }
    return x
}

func (d *DSU) Union(a, b int) {
    ra := d.Find(a)
    rb := d.Find(b)
    if ra == rb {
        return
    }
    if d.size[ra] < d.size[rb] {
        ra, rb = rb, ra
    }
    d.parent[rb] = ra
    d.size[ra] += d.size[rb]
}

type Edge struct {
    U int
    V int
}

type Query struct {
    U int
    V int
}

func ParallelBinarySearchDSU(n int, edges []Edge, queries []Query) []int {
    t := len(edges)
    q := len(queries)

    low := make([]int, q)
    high := make([]int, q)
    for i := range high {
        high[i] = t
    }

    for {
        active := false
        buckets := make([][]int, t)

        for i := 0; i < q; i++ {
            if low[i] < high[i] {
                active = true
                mid := (low[i] + high[i]) / 2
                if mid < t {
                    buckets[mid] = append(buckets[mid], i)
                }
            }
        }

        if !active {
            break
        }

        dsu := NewDSU(n)

        for time, e := range edges {
            dsu.Union(e.U, e.V)

            for _, qi := range buckets[time] {
                query := queries[qi]
                if dsu.Find(query.U) == dsu.Find(query.V) {
                    high[qi] = time
                } else {
                    low[qi] = time + 1
                }
            }
        }
    }

    ans := make([]int, q)
    for i := 0; i < q; i++ {
        if low[i] < t {
            ans[i] = low[i]
        } else {
            ans[i] = -1
        }
    }

    return ans
}