Scudo & friends: hardened allocators in production
The bag-of-tricks malloc (guard pages, sized buckets, randomisation, double-free detection) that ships on every Android device and most of Chrome.
§1 Provenance
- LLVM, “Scudo Hardened Allocator.” https://llvm.org/docs/ScudoHardenedAllocator.html
- Android Open Source Project, “Scudo.” https://source.android.com/docs/security/test/scudo
- Serebryany et al., “GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production.” ASPLOS 2024. arXiv 2311.09394. https://arxiv.org/html/2311.09394v2
- Chromium Project, “GWP-ASan: Sampling heap memory error detection in-the-wild.” https://www.chromium.org/Home/chromium-security/articles/gwp-asan/
- Google tcmalloc GWP-ASan docs. https://google.github.io/tcmalloc/gwp-asan.html
- Chromium PartitionAlloc docs. https://chromium.googlesource.com/chromium/src.git/+/HEAD/base/allocator/partition_allocator/PartitionAlloc.md
- struct/HardenedPartitionAlloc fork. https://github.com/struct/HardenedPartitionAlloc
- GrapheneOS
hardened_malloc. https://github.com/GrapheneOS/hardened_malloc - mimalloc (Microsoft Research). https://github.com/microsoft/mimalloc
- Trail of Bits, “Use GWP-ASan to detect exploits in production environments.” Dec 2025. https://blog.trailofbits.com/2025/12/16/use-gwp-asan-to-detect-exploits-in-production-environments/
§2 Mechanism
A “hardened allocator” is a userspace malloc/free that trades some throughput and memory for runtime detection and mitigation of heap-corruption bugs. The toolbox:
- Sized buckets / size classes: chunks of the same size class are pooled together. Limits type confusion between objects of wildly different sizes; OOB into a neighbour hits a same-shape neighbour, so corruption is more contained.
- Per-partition isolation: high-value objects (e.g., DOM nodes vs. ArrayBuffer storage in Chrome) live in entirely separate VA partitions. Linear overflow can’t cross partition boundaries.
- Guard pages: every large allocation is
mmap’d withPROT_NONEpages around it; OOB read/write traps. GWP-ASan applies this to a sampled subset of all small allocations too. - Randomised free-list ordering: instead of returning the most-recently-freed chunk (which is exactly what UAF wants), pick a random chunk from the free list.
- Random allocation offset within a page: GWP-ASan flips a coin to align allocations left or right inside a guarded page, so OOB reads on the wrong side also trap.
- Header checksumming: each chunk has a small inline header (size, state, double-free guard) protected by an XOR-style checksum keyed by a per-process secret + the chunk address. Tampering trips a check on
free. - Delayed re-use / quarantine: like MarkUs but bounded. Each freed chunk waits N other frees before becoming eligible again.
- Type-aware free: deallocator must match allocator (
freevsdeletevsdelete[]). Mismatch = abort. - Pointer poisoning on free: payload
memsetto 0xDE so old contents don’t leak. - Tag integration (Scudo + MTE): on Pixel 8/9, Scudo uses MTE if available to stamp each chunk with a hardware tag. Free re-tags. Free MTE = free temporal+spatial detection.
- Backtrace logging on abort: shipped crash carries the call stack of the corrupting
free.
§3 Threat model + guarantees
- Heap overflow (linear): stopped against guard pages and partition boundaries; merely detected probabilistically inside a partition (size-class bucket hides smallish OOB).
- Use-after-free: quarantine delays + randomised reuse make exploitation hard, not impossible. MTE integration (Pixel) raises detection to 1/16 per attempt.
- Double-free: deterministically detected via state bits in chunk header.
- Free of invalid pointer: header checksum traps with very high probability.
- Heap spray: less effective due to randomised free list and partition separation.
- Type confusion: PartitionAlloc partitions help; Scudo’s size classes help less.
- Side channels: GWP-ASan introduces a tiny timing bias for guarded allocations; not exploitable. Header secrets are not cryptographic; an attacker with a leak primitive can defeat header checks.
- Not protected: deep logic bugs, JIT bugs, anything the allocator isn’t on the path of.
§4 Production status (May 2026)
- Scudo: default allocator in Android 11+ for all non-low-memory devices. Default in Fuchsia. Available via Clang
-fsanitize=scudofor any C/C++ project on Linux. Standalone version incompiler-rt. Detection events surface inlogcatand crash reports. - GWP-ASan: shipping in Android since Android 11 (via Scudo) and in Chrome on all platforms for malloc and PartitionAlloc. Apple’s Probabilistic Guard Malloc (PGM) is a sibling: as of Sept 2023, 3748 PGM bug reports filed, 99% fix rate, only 13 closed without resolution. Trail of Bits Dec 2025 post promotes GWP-ASan for production hardening of arbitrary C/C++ services.
- PartitionAlloc: Chrome’s per-partition allocator, ~150 partitions covering DOM, ArrayBuffer, etc. MiraclePtr (built on PartitionAlloc) protects
raw_ptr<T>fields from UAF; >90% of UAF exploit attempts now trap on MiraclePtr in production telemetry. - HardenedPartitionAlloc: third-party fork adding canaries, additional freelist randomisation, delayed-free vectors. Not in mainline Chrome; used by some hardened distros.
- GrapheneOS hardened_malloc: ground-up rewrite emphasising security over throughput. Default on every GrapheneOS install (Pixel-only).
- mimalloc-secure: build flag in Microsoft’s
mimallocenabling guards, encoded free list, randomised initial free list. Used in some .NET production deployments. - tcmalloc (Google server-side): GWP-ASan integration documented; per Google docs, default sampling rate keeps CPU overhead negligible, max RAM ~512 KB on x86_64 / 4 MB on POWER.
- glibc malloc: has the
MALLOC_CHECK_family of weaker hardenings; full Scudo replacement is per-project opt-in, no distro default change as of May 2026.
CVE-class evidence: Chrome’s MiraclePtr is credited with eliminating a non-trivial fraction of renderer UAFs; PGM’s 99% fix rate (Apple) speaks to detection quality. Per Google Security Blog, MTE+Scudo on Pixel catches an “order of magnitude” more bugs than HWASan-only builds.
§5 Software emulation cost
These are the software-only hardening layer. There is nothing cheaper to emulate; they are the emulation.
Reported numbers:
- Scudo standalone: typically a few percent faster than glibc malloc on multithreaded benchmarks (size-class design wins on contention), with the security-mitigations enabled.
- GWP-ASan in Chrome: amortised near-zero overhead at the default sampling rate (~1/4096 allocations).
- PartitionAlloc: ~5-10% memory overhead vs jemalloc for the partition bookkeeping; CPU comparable.
- Hardened mimalloc-secure: ~5-15% slower than mimalloc-release on alloc-heavy benchmarks.
- GrapheneOS hardened_malloc: ~30-50% slower than glibc; trades throughput for security.
- PGM (Apple): <1% CPU on most workloads.
A managed runtime that uses a hardened C allocator under the hood gets these benefits for the runtime’s own memory (parser tables, GC bookkeeping, etc.) without further work.
§6 Mochi adaptation note
vm3 sits above the underlying Go allocator. Go’s runtime allocator (mheap + mcentral + mcache) already does:
- Size-class buckets (matches Scudo’s pattern).
- Spans (Go’s per-thread cache) reduce contention.
- A simple guard via
mprotectis available but not used by default. - No GWP-ASan equivalent in Go’s runtime.
The slab arenas that back vm3’s Cells are allocated from Go. So vm3 inherits some of Scudo’s properties (size classes, span isolation) by virtue of being on Go.
What Scudo-style hardening would mean for vm3:
| Scudo trick | vm3 / MEP-40 mapping |
|---|---|
| Size-class buckets | Per-type slabs = one bucket per concrete type |
| Per-partition isolation | Arena tag (4-bit) chooses which arena owns slot |
| Guard pages | Could mprotect boundary pages between slabs |
| Randomised free list | We currently free LIFO; could randomise |
| Header checksum | The 12-bit generation IS our checksum-equivalent |
| Quarantine | Implicit via generation bump (see file 08) |
| Pointer poisoning | Could memclr slot on free |
| GWP-ASan sampling | Could allocate 1/N Cells to a guarded sub-arena |
| Type-aware free | Trivially: free of wrong-arena Cell is a tag mismatch |
vm3 already gets most of these structurally. The biggest additions a hardened-allocator stance would bring:
memclr-on-free of the slot payload. Costs O(slot size) per free; recovers a class of info-leak defences. MEP-40 doesn’t mandate this today.- GWP-ASan-style guarded sub-arena for low-rate sampling. Useful in CI / canary deployments to detect a use-of-stale-cell that snuck past generation collision. Cheap if 1/4096 of allocations.
mprotecton the metadata page so out-of-process / FFI corruption of slot metadata is detected (already raised in file 05).- Randomised generation increment (also raised in file 08).
- Slab-boundary guard pages between slabs to convert linear OOB writes into SIGSEGV instead of relying on the next-slot generation to also be stale.
Reference: MEP-40 (arena+slab+gen), MEP-15 (effects partition arenas naturally), MEP-16 (null-safety).
§7 Open questions for MEP-41 design
- Should vm3 maintain its own arena allocator (MEP-40 today) or delegate to Scudo via cgo when available? The latter gives us free GWP-ASan; the former keeps vm3 portable to non-LLVM toolchains and to Go-only builds.
memclron free: is the perf cost acceptable? For small Cells (slot ≤ 64 B), it’s a single cacheline write. For large object Cells, it’s not.- Should a fraction of allocations route to a guarded sub-arena (vm3’s GWP-ASan)? What’s the sample rate, and how is the trap surfaced as a Mochi error?
- How do we expose the equivalent of Apple’s PGM 99% fix rate — a clear, blameable backtrace at the corrupting site, not at the deref site?
- If we ever build a native Mochi production runtime (not Go-hosted), should it use Scudo, mimalloc-secure, hardened_malloc, or its own thing?