Floyd Sampling
Select k distinct integers uniformly from a fixed range without shuffling the whole range.
Floyd Sampling
Floyd Sampling selects $k$ distinct integers uniformly from the range $0$ to $n - 1$. It avoids building and shuffling the full range, so it is useful when $k$ is much smaller than $n$.
The algorithm inserts one value per iteration into a set. Each insertion preserves uniformity over all subsets of the processed range.
Problem
Given integers $n$ and $k$ with:
$$ 0 \le k \le n $$
return a uniformly random subset of size $k$ from:
$$ {0, 1, 2, \dots, n - 1} $$
Algorithm
Iterate from $n-k$ to $n-1$. At step $j$, choose a random integer $t$ from $0$ to $j$. If $t$ is already selected, insert $j$ instead. Otherwise insert $t$.
floyd_sampling(n, k):
S = empty set
for j from n - k to n - 1:
t = random integer from 0 to j
if t in S:
insert j into S
else:
insert t into S
return S
Example
Let:
$$ n = 10 $$
and:
$$ k = 3 $$
The loop runs for:
$$ j = 7, 8, 9 $$
After three insertions, the set contains three distinct values from $0$ through $9$. Each 3-element subset has the same probability.
Correctness
After processing step $j$, the set contains exactly:
$$ j - (n-k) + 1 $$
items selected uniformly from:
$$ {0, 1, \dots, j} $$
At step $j$, the random choice $t$ gives every value in the current range an equal chance to be inserted. If $t$ was already selected, inserting $j$ preserves the size and avoids duplicates. This replacement rule maintains uniform probability over all subsets of the required size.
By induction, after the final step $j = n - 1$, the set is a uniform $k$-subset of the full range.
Complexity
| measure | cost |
|---|---|
| time | $O(k)$ |
| memory | $O(k)$ |
| random choices | $k$ |
This is better than shuffling when $k \ll n$.
When to Use
Use Floyd Sampling when:
- you need distinct random indices
- the sample size is small relative to the range
- building the full range would waste memory
- order of the sample does not matter
For sampling actual stream items of unknown length, use Reservoir Sampling instead.
Implementation
import random
def floyd_sampling(n, k):
selected = set()
for j in range(n - k, n):
t = random.randint(0, j)
if t in selected:
selected.add(j)
else:
selected.add(t)
return selected
import "math/rand"
func FloydSampling(n, k int) map[int]struct{} {
selected := make(map[int]struct{}, k)
for j := n - k; j < n; j++ {
t := rand.Intn(j + 1)
if _, ok := selected[t]; ok {
selected[j] = struct{}{}
} else {
selected[t] = struct{}{}
}
}
return selected
}