Arm MTE | brain

Per-16-byte 4-bit lock-and-key tags on every allocation granule, in Armv8.5-A and shipping on Pixel 8/9.

§1 Provenance

Arm, "Armv8.5 Memory Tagging Extension," whitepaper (2019, rev. 2023). https://developer.arm.com/documentation/108035/latest/Introduction-to-the-Memory-Tagging-Extension
Serebryany et al. (Google), "MTE: The promising path forward for memory safety." Google Security Blog, Nov 2023. https://security.googleblog.com/2023/11/mte-promising-path-forward-for-memory.html
Google Project Zero, "First handset with MTE on the market." November 2023. https://projectzero.google/2023/11/first-handset-with-mte-on-market.html
Android Open Source Project: Arm MTE. https://source.android.com/docs/security/test/memory-safety/arm-mte
Android NDK guide: Arm Memory Tagging Extension. https://developer.android.com/ndk/guides/arm-mte
Kim, Jang, et al. "TIKTAG: Breaking ARM's Memory Tagging Extension with Speculative Execution." USENIX Security 2024. https://www.theregister.com/2024/06/18/arm_memory_tag_extensions_leak/
Blumbergs, "Memory Tagging Extension in 2025 — what actually works." Sept 2025. https://medium.com/@e.blumbergs/memory-tagging-extension-in-2025
Göbel, "Introduction to Arm Memory Tagging Extensions." Sept 2025. https://thore.io/posts/2025/09/introduction-to-arm-memory-tagging-extensions/

§2 Mechanism

MTE is an Armv8.5-A architectural extension (carried into v9 baseline) that introduces:

A 16-byte tag granule of physical memory. Every aligned 16 B chunk has an associated 4-bit allocation tag (the "lock") stored in a separate physical tag region (hidden from data loads/stores).
A 4-bit address tag placed in pointer bits [59:56] (the high byte, leveraging the existing Top-Byte-Ignore convention).
On every load/store, the CPU compares pointer-tag against the granule's allocation-tag. Mismatch raises a synchronous or asynchronous tag check fault depending on TCR_EL1.TCMA* configuration.
Tag-generation instructions IRG (random tag) and ADDG/SUBG (arithmetic with tag) plus STG/STZG (set tag) / STGM (multi-granule set) let the allocator stamp tags efficiently. LDG reads the current tag.
Tag carry-through: MTE is integrated with the data cache. Tags ride alongside cache lines; eviction writes them to a kernel-reserved DRAM region.

Three operating modes:

SYNC — tag mismatch SIGSEGV with SEGV_MTESERR, full fault address. Used in production where the cost is acceptable, used in dev/test always.
ASYNC — mismatch logged in registers but execution continues until the next kernel entry; SIGSEGV SEGV_MTEAERR without fault address. Low overhead (~5-10%).
Asymmetric (Armv8.7-A): sync on loads, async on stores. Currently recommended by Google over plain ASYNC.

Linux kernel KASAN-HW uses MTE for kernel-side detection (since 5.10) and HWASan uses MTE for userspace under Android.

§3 Threat model + guarantees

Spatial safety (probabilistic): a linear OOB into a differently-tagged neighbour traps. Same-tag neighbours collide with probability 1/16.
Temporal safety (probabilistic): on free, the allocator re-tags the granule, so a dangling pointer with the old tag traps on next use. Collision again 1/16 in the worst case; in practice MTE allocators rotate tags to maximise distance.
Type confusion / control-flow / side-channel: not protected by MTE. PAC+BTI cover CFI; MTE covers memory tagging only.
Side channels: TIKTAG (USENIX Sec 2024) shows two speculative-execution gadgets (v1/v2) that derandomise the 4-bit allocation tag at any address in <4 s with >95% success against Chrome processes on Pixel 8, by observing prefetch-induced timing variations after a tag-check fault. Google bug-bountied and patched the userspace impact; the architectural class remains.
Not protected: information disclosure (tag is integrity), uninitialised reads, logic bugs, JIT-spray once an attacker can leak tags.

§4 Production status (May 2026)

Hardware: every Arm v9 Cortex-A core supports MTE (A510, A710, A715, A720, X2-X4, A725, X925, plus Apple's M3/M4 for kernel use; Apple has historically not exposed MTE to userspace).
Pixel 8 (Oct 2023) was the first commercial handset with MTE exposed. Pixel 9 (2024) and Pixel 10 (2025) continue support via Tensor G3/G4/G5. MTE remains opt-in via Developer Options in stock Android 14/15/16 — not on by default for arbitrary apps. The Android Runtime (ART), Bionic libc, and some system services run MTE-on by Google.
GrapheneOS has MTE-by-default for the OS and any opting-in apps since 2023.
Android version support: stack-tagging added in Android 14 QPR3 (mid-2024); heap-tagging on by default for many Google-built apps in Android 15.
Linux kernel: KASAN-HW since 5.10; user-mode HWASan on AOSP toolchains.
Glibc MTE integration: ongoing — Glibc 2.39 (Feb 2024) shipped some support; production-default still off for most distros; Ubuntu 24.04 LTS treats MTE as opt-in via glibc.malloc.mta_* tunables on supported hardware. Android Bionic is the integrated baseline.
Google's "MTE in production" data: not a single canonical paper as of May 2026, but the Project Zero Nov 2023 post and the 2024 security blog give the substantive deployment claims — MTE reduced a class of bugs by an unspecified large factor in Pixel internal fleet tests; specific CVE-elimination percentages are not published openly.
TIKTAG and follow-up SCA work demonstrate that MTE alone cannot defend against an attacker with a JIT/sandbox-side timing oracle; this is the published consensus driving V8's "sandbox + MTE" hybrid (see §5 doc).

§5 Software emulation cost

A pure-software MTE analogue (HWASan in software-only mode, AddressSanitizer, Valgrind, MarkUs-style quarantine):

HWASan software (no MTE hardware): ~2x slowdown, ~2x memory (one shadow byte per 16 B + compiler instrumentation on every load/store).
ASan: ~2x slowdown, ~3x memory, 8x-shadow scheme. Used for testing, not production.
MarkUs quarantine for temporal-only: 1.1x geomean, peak 2x; 16% memory overhead.
FFmalloc (never-reuse-VA, related to MTE temporal goal): ~2.3% CPU but 61% memory on SPEC.

Compared with MTE hardware (typically <5% time, ~3% memory in async / asymmetric mode on Pixel 8), the software gap is roughly 2-3 orders of magnitude on time and 1-2 on memory. This is why MTE is interesting to deploy even though its protection is probabilistic.

§6 Mochi adaptation note

vm3's 12-bit generation in the Cell is essentially MTE-in-software, with 256x more collision resistance (12 bits vs MTE's 4) at the cost of one extra check per dereference. Direct mapping:

MTE	vm3 Cell
4-bit address tag in pointer bits	12-bit generation in Cell
4-bit allocation tag per 16 B granule	12-bit generation in slab slot metadata
Tag-mismatch trap on dereference	Generation-mismatch check in handle resolve
Tag rotation on free	Generation bump on slot reuse
1/16 collision rate	1/4096 collision rate
Hardware-checked, ~<5% overhead	Software-checked, ~one branch per deref

Quantitatively: where MTE allows a UAF on the same slot to succeed with probability ≈ 6.25% (1/16), vm3's 12-bit generation makes that ≈ 0.0244% (1/4096) — about 256x better. The cost is the explicit check; on a JITted hot path that branch is one predictable compare. Where the MEP-40 design already pays off most strongly, MTE provides essentially nothing extra; where MEP-40 might want to drop the check (e.g., behind a "known-fresh" optimisation), MTE-on-native could be a fallback safety net.

MTE is most interesting to us not as something to copy but as something to inherit. If a Mochi binary is JITted to AArch64 v8.5+ and runs under an MTE-enabled allocator, the Go-allocated slab pages themselves gain MTE protection against C-level corruption — useful for the FFI boundary even though vm3 itself is already covered by the generation check.

Where vm3 falls short: we have no protection against bit-flip / Rowhammer attacks on the slab metadata itself. MTE doesn't either, really, but its tag is sideband. Closing this gap would mean storing the slot generation in a separate page guarded by mprotect or in MTE-tagged memory when available.

References: MEP-40 (the 12-bit gen sizing decision), MEP-16 (null-safety which already routes through generation check).

§7 Open questions for MEP-41 design

Do we ever want to shrink the 12-bit generation? MTE got away with 4 bits because the OS rotates tags well; if our scheduler does aggressive arena recycling, can we drop to 8 bits and reclaim 4 bits for permissions?
Should the vm3 generation increment be random (MTE-style IRG) rather than monotonic, to defeat predictability attacks on long-running processes?
If we run on MTE hardware, do we route Go slab allocations through mte_tag_region so the underlying memory also gets hw-tagged?
TIKTAG-style side channels exist when an attacker has a fast timing oracle. vm3 currently has no such oracle exposed — should MEP-41 explicitly forbid features that would create one (high-resolution timers, certain debug effects)?
Is there a meaningful integration story with kernel MTE for the Go runtime page heap that backs our slabs?