Skip to content

Arm MTE

Arm MTE

Per-16-byte 4-bit lock-and-key tags on every allocation granule, in Armv8.5-A and shipping on Pixel 8/9.

§1 Provenance

§2 Mechanism

MTE is an Armv8.5-A architectural extension (carried into v9 baseline) that introduces:

  • A 16-byte tag granule of physical memory. Every aligned 16 B chunk has an associated 4-bit allocation tag (the “lock”) stored in a separate physical tag region (hidden from data loads/stores).
  • A 4-bit address tag placed in pointer bits [59:56] (the high byte, leveraging the existing Top-Byte-Ignore convention).
  • On every load/store, the CPU compares pointer-tag against the granule’s allocation-tag. Mismatch raises a synchronous or asynchronous tag check fault depending on TCR_EL1.TCMA* configuration.
  • Tag-generation instructions IRG (random tag) and ADDG/SUBG (arithmetic with tag) plus STG/STZG (set tag) / STGM (multi-granule set) let the allocator stamp tags efficiently. LDG reads the current tag.
  • Tag carry-through: MTE is integrated with the data cache. Tags ride alongside cache lines; eviction writes them to a kernel-reserved DRAM region.

Three operating modes:

  1. SYNC — tag mismatch SIGSEGV with SEGV_MTESERR, full fault address. Used in production where the cost is acceptable, used in dev/test always.
  2. ASYNC — mismatch logged in registers but execution continues until the next kernel entry; SIGSEGV SEGV_MTEAERR without fault address. Low overhead (~5-10%).
  3. Asymmetric (Armv8.7-A): sync on loads, async on stores. Currently recommended by Google over plain ASYNC.

Linux kernel KASAN-HW uses MTE for kernel-side detection (since 5.10) and HWASan uses MTE for userspace under Android.

§3 Threat model + guarantees

  • Spatial safety (probabilistic): a linear OOB into a differently-tagged neighbour traps. Same-tag neighbours collide with probability 1/16.
  • Temporal safety (probabilistic): on free, the allocator re-tags the granule, so a dangling pointer with the old tag traps on next use. Collision again 1/16 in the worst case; in practice MTE allocators rotate tags to maximise distance.
  • Type confusion / control-flow / side-channel: not protected by MTE. PAC+BTI cover CFI; MTE covers memory tagging only.
  • Side channels: TIKTAG (USENIX Sec 2024) shows two speculative-execution gadgets (v1/v2) that derandomise the 4-bit allocation tag at any address in <4 s with >95% success against Chrome processes on Pixel 8, by observing prefetch-induced timing variations after a tag-check fault. Google bug-bountied and patched the userspace impact; the architectural class remains.
  • Not protected: information disclosure (tag is integrity), uninitialised reads, logic bugs, JIT-spray once an attacker can leak tags.

§4 Production status (May 2026)

  • Hardware: every Arm v9 Cortex-A core supports MTE (A510, A710, A715, A720, X2-X4, A725, X925, plus Apple’s M3/M4 for kernel use; Apple has historically not exposed MTE to userspace).
  • Pixel 8 (Oct 2023) was the first commercial handset with MTE exposed. Pixel 9 (2024) and Pixel 10 (2025) continue support via Tensor G3/G4/G5. MTE remains opt-in via Developer Options in stock Android 14/15/16 — not on by default for arbitrary apps. The Android Runtime (ART), Bionic libc, and some system services run MTE-on by Google.
  • GrapheneOS has MTE-by-default for the OS and any opting-in apps since 2023.
  • Android version support: stack-tagging added in Android 14 QPR3 (mid-2024); heap-tagging on by default for many Google-built apps in Android 15.
  • Linux kernel: KASAN-HW since 5.10; user-mode HWASan on AOSP toolchains.
  • Glibc MTE integration: ongoing — Glibc 2.39 (Feb 2024) shipped some support; production-default still off for most distros; Ubuntu 24.04 LTS treats MTE as opt-in via glibc.malloc.mta_* tunables on supported hardware. Android Bionic is the integrated baseline.
  • Google’s “MTE in production” data: not a single canonical paper as of May 2026, but the Project Zero Nov 2023 post and the 2024 security blog give the substantive deployment claims — MTE reduced a class of bugs by an unspecified large factor in Pixel internal fleet tests; specific CVE-elimination percentages are not published openly.
  • TIKTAG and follow-up SCA work demonstrate that MTE alone cannot defend against an attacker with a JIT/sandbox-side timing oracle; this is the published consensus driving V8’s “sandbox + MTE” hybrid (see §5 doc).

§5 Software emulation cost

A pure-software MTE analogue (HWASan in software-only mode, AddressSanitizer, Valgrind, MarkUs-style quarantine):

  • HWASan software (no MTE hardware): ~2x slowdown, ~2x memory (one shadow byte per 16 B + compiler instrumentation on every load/store).
  • ASan: ~2x slowdown, ~3x memory, 8x-shadow scheme. Used for testing, not production.
  • MarkUs quarantine for temporal-only: 1.1x geomean, peak 2x; 16% memory overhead.
  • FFmalloc (never-reuse-VA, related to MTE temporal goal): ~2.3% CPU but 61% memory on SPEC.

Compared with MTE hardware (typically <5% time, ~3% memory in async / asymmetric mode on Pixel 8), the software gap is roughly 2-3 orders of magnitude on time and 1-2 on memory. This is why MTE is interesting to deploy even though its protection is probabilistic.

§6 Mochi adaptation note

vm3’s 12-bit generation in the Cell is essentially MTE-in-software, with 256x more collision resistance (12 bits vs MTE’s 4) at the cost of one extra check per dereference. Direct mapping:

MTEvm3 Cell
4-bit address tag in pointer bits12-bit generation in Cell
4-bit allocation tag per 16 B granule12-bit generation in slab slot metadata
Tag-mismatch trap on dereferenceGeneration-mismatch check in handle resolve
Tag rotation on freeGeneration bump on slot reuse
1/16 collision rate1/4096 collision rate
Hardware-checked, ~<5% overheadSoftware-checked, ~one branch per deref

Quantitatively: where MTE allows a UAF on the same slot to succeed with probability ≈ 6.25% (1/16), vm3’s 12-bit generation makes that ≈ 0.0244% (1/4096) — about 256x better. The cost is the explicit check; on a JITted hot path that branch is one predictable compare. Where the MEP-40 design already pays off most strongly, MTE provides essentially nothing extra; where MEP-40 might want to drop the check (e.g., behind a “known-fresh” optimisation), MTE-on-native could be a fallback safety net.

MTE is most interesting to us not as something to copy but as something to inherit. If a Mochi binary is JITted to AArch64 v8.5+ and runs under an MTE-enabled allocator, the Go-allocated slab pages themselves gain MTE protection against C-level corruption — useful for the FFI boundary even though vm3 itself is already covered by the generation check.

Where vm3 falls short: we have no protection against bit-flip / Rowhammer attacks on the slab metadata itself. MTE doesn’t either, really, but its tag is sideband. Closing this gap would mean storing the slot generation in a separate page guarded by mprotect or in MTE-tagged memory when available.

References: MEP-40 (the 12-bit gen sizing decision), MEP-16 (null-safety which already routes through generation check).

§7 Open questions for MEP-41 design

  1. Do we ever want to shrink the 12-bit generation? MTE got away with 4 bits because the OS rotates tags well; if our scheduler does aggressive arena recycling, can we drop to 8 bits and reclaim 4 bits for permissions?
  2. Should the vm3 generation increment be random (MTE-style IRG) rather than monotonic, to defeat predictability attacks on long-running processes?
  3. If we run on MTE hardware, do we route Go slab allocations through mte_tag_region so the underlying memory also gets hw-tagged?
  4. TIKTAG-style side channels exist when an attacker has a fast timing oracle. vm3 currently has no such oracle exposed — should MEP-41 explicitly forbid features that would create one (high-resolution timers, certain debug effects)?
  5. Is there a meaningful integration story with kernel MTE for the Go runtime page heap that backs our slabs?