Misra Gries

Find frequent items in a stream by maintaining a bounded set of counters.

Misra Gries

Misra Gries is a deterministic streaming algorithm for finding frequent items with bounded memory. It keeps at most $k - 1$ counters and returns a small candidate set that contains every item whose frequency is greater than $n/k$.

The algorithm may return extra candidates. A second pass gives exact frequencies and removes false positives.

Problem

Given a stream

$$ x_1, x_2, \dots, x_n $$

and an integer $k \ge 2$, find all items whose frequency is greater than

$$ n/k $$

Algorithm

Maintain a map from item to counter.

misra_gries(stream, k):
    counters = empty map

    for x in stream:
        if x in counters:
            counters[x] = counters[x] + 1
        else if size(counters) < k - 1:
            counters[x] = 1
        else:
            for each item y in counters:
                counters[y] = counters[y] - 1
                if counters[y] == 0:
                    remove y from counters

    return keys(counters)

The returned keys are candidates. To get the exact heavy hitters, scan the stream again and count only those candidates.

Key Idea

When the table is full and a new item is not tracked, the algorithm subtracts one count from every tracked item. This cancels a group of $k$ distinct items: the new untracked item and the $k - 1$ tracked items.

An item that appears more than $n/k$ times cannot be fully canceled by these groups, so it must remain as a candidate.

Example

Let

$$ stream = [a, b, a, c, a, d, b, a] $$

and

$$ k = 3 $$

The algorithm keeps at most two counters. Since $a$ appears $4$ times and

$$ 4 > 8/3 $$

$a$ must be returned as a candidate.

Correctness

Each full-table decrement can remove one occurrence from at most $k$ distinct items. Think of this as deleting one balanced group.

Suppose an item $x$ appears more than $n/k$ times. Since each deletion group contains at most one copy of $x$, fewer than $\text{count}(x)$ groups can delete all copies of $x$. Therefore some copy of $x$ survives in the counter structure.

Thus every item with frequency greater than $n/k$ appears in the final candidate set.

Complexity

measure cost
counters at most $k - 1$
memory $O(k)$
update, simple version $O(k)$ worst case
total time $O(nk)$ simple version
verification pass $O(n + k)$

The common implementation is simple and effective when $k$ is small.

When to Use

Use Misra Gries when:

  • the input is a stream
  • exact counting for all items is too large
  • deterministic guarantees are preferred
  • false positives are acceptable before verification

For approximate frequency estimates on arbitrary keys, use Count Min Sketch.

Implementation

def misra_gries(stream, k):
    counters = {}

    for x in stream:
        if x in counters:
            counters[x] += 1
        elif len(counters) < k - 1:
            counters[x] = 1
        else:
            remove = []

            for y in counters:
                counters[y] -= 1
                if counters[y] == 0:
                    remove.append(y)

            for y in remove:
                del counters[y]

    return list(counters.keys())
def misra_gries_exact(stream, k):
    candidates = misra_gries(stream, k)

    counts = {x: 0 for x in candidates}
    for x in stream:
        if x in counts:
            counts[x] += 1

    n = len(stream)
    return [x for x in candidates if counts[x] > n // k]
func MisraGries[T comparable](stream []T, k int) []T {
	counters := map[T]int{}

	for _, x := range stream {
		if _, ok := counters[x]; ok {
			counters[x]++
			continue
		}

		if len(counters) < k-1 {
			counters[x] = 1
			continue
		}

		var remove []T
		for y := range counters {
			counters[y]--
			if counters[y] == 0 {
				remove = append(remove, y)
			}
		}

		for _, y := range remove {
			delete(counters, y)
		}
	}

	out := make([]T, 0, len(counters))
	for x := range counters {
		out = append(out, x)
	}

	return out
}