A hardware second stack that records return addresses, plus indirect-branch landing pads. Shipping on every modern Intel/AMD CPU, default-on in Windows 11, opt-in on Linux.
§1 Provenance
- Intel, “Control-flow Enforcement Technology Specification.” Document 334525, Rev. 3. https://kib.kiev.ua/x86docs/Intel/CET/334525-003.pdf
- Linux kernel “Control-flow Enforcement Technology (CET) Shadow Stack” docs (v6.14+). https://docs.kernel.org/next/x86/shstk.html
- LWN, “Shadow stacks for userspace.” Sept 2022. https://lwn.net/Articles/913934/
- Phoronix, “Glibc Updated For Recent Linux CET Shadow Stack Support.” Jan 2024. https://www.phoronix.com/news/Glibc-Intel-CET-Shadow-Stack
- McGarr, “Exploit Development: Investigating Kernel Mode Shadow Stacks on Windows.” https://connormcgarr.github.io/km-shadow-stacks/
- Synacktiv, “Analyzing the Windows kernel shadow stack mitigation.” SSTIC 2025. https://www.synacktiv.com/sites/default/files/2025-06/sstic_windows_kernel_shadow_stack_mitigation.pdf
- h3xduck, “How to enable Intel CET.” June 2025. https://h3xduck.github.io/cfi/2025/06/26/enabling-intel-cet.html
- x86.lol, “Hardening C Against ROP: Getting CET Shadow Stacks Working.” Sept 2024. https://x86.lol/generic/2024/09/23/user-shadow-stacks.html
§2 Mechanism
CET has two architectural components:
- Shadow Stack (SHSTK). The CPU maintains a second stack at a virtual address held in
SSP(Shadow Stack Pointer), mapped with a special page-table attribute (PTE.shadow=1) such that only special instructions (WRSS,RSTORSSP,INCSSP) can write to it. OnCALL, the CPU pushes the return address to both the regular stack and the shadow stack; onRET, it pops both and compares — mismatch raises#CP(Control Protection) exception. - Indirect Branch Tracking (IBT). Every legal target of an indirect
JMP/CALLmust begin withENDBR64(orENDBR32). Anotrackprefix can be used on jumps known to dispatch to non-instrumented code. Indirect branch into a non-ENDBRcauses#CP.
Shadow stacks are per-thread; the OS lazily allocates one on the first arch_prctl(ARCH_SHSTK_ENABLE) call. The kernel can choose to enable shadow stack for itself separately (Windows does; Linux does as of 6.6).
Hardware: Intel Tiger Lake (11th gen, 2020) introduced SHSTK; Sapphire Rapids (4th gen Xeon, 2023) brought it to server. AMD Zen 3 (2020) introduced shadow stack. IBT is supported on the same generations.
§3 Threat model + guarantees
- Backward-edge CFI: SHSTK fully closes naive stack-buffer-overflow → ROP chains, provided the kernel has shadow stack on too. An attacker who corrupts the regular stack now has to corrupt the shadow stack as well, but the shadow stack is page-protected so this requires a privileged write primitive.
- Forward-edge CFI: IBT eliminates JOP gadgets that don’t begin with
ENDBR. Mostly equivalent to BTI on Arm. - Not protected by SHSTK:
- In-band control transfers (sigreturn-oriented programming, exception-handling chains) — recent CVEs in Windows kernel showed
KiSwapStackand_C_specific_handlergadgets. - JIT-emitted code that doesn’t emit
ENDBRat each entry point. - Data-only attacks (write-what-where targeting application data not return addresses).
- In-band control transfers (sigreturn-oriented programming, exception-handling chains) — recent CVEs in Windows kernel showed
- Not protected by IBT on Windows: Windows specifically did not deploy IBT in production; it uses CFG (Control Flow Guard) bitmap instead. So Windows protection is SHSTK + CFG, not SHSTK + IBT.
- Side channels: no direct CET-specific leak as of May 2026, but standard Spectre v1/v2 still applies to indirect branches that pass IBT.
§4 Production status (May 2026)
- Windows 11: SHSTK exposed via “Kernel-mode Hardware-enforced Stack Protection” toggle in Windows Security UI (since 22H2). Apps must opt-in via
/CETCOMPATlink flag or viaSetProcessMitigationPolicy(ProcessUserShadowStackPolicy)at runtime. Windows 11 23H2 and 24H2 enable kernel SHSTK by default on supported hardware. Win11 does not enforce IBT (uses CFG). - Linux: SHSTK landed mainline in 6.6 (Oct 2023); fully usable interface stabilised by 6.8. Glibc 2.39 (Feb 2024) added userspace SHSTK enablement; it’s opt-in via
GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK. As of May 2026, no major distro enables it by default for all binaries; selected hardened distros (Gentoo hardened, Alpine) ship it on. Linux does not enforce userspace IBT either as of 6.x — too many older binaries fail withoutENDBR. - GCC/Clang:
-fcf-protection=fullemits bothENDBRand shadow-stack-compatible code (i.e., uses standardCALL/RET, since SHSTK is transparent). - Glibc 2.39 also added the dynamic loader’s verification that all dependent shared libraries are CET-compatible before enabling for a process; this is the chief reason it’s gated.
- 2025 status (SSTIC paper): Synacktiv showed real-world Windows kernel SHSTK bypasses via the exception-handling and stack-swap paths, motivating ongoing work to extend the protection to those code paths. The mitigation is real but the kernel TCB has plenty of gadgets that don’t go through
RET.
§5 Software emulation cost
SHSTK in software (e.g., LLVM -fsanitize=shadow-call-stack):
- AArch64 shadow-call-stack reserves x18 and stores RA there; ~1-3% overhead, requires no hardware.
- x86-64 shadow-call-stack via
gs:-offset shadow region: also ~1-3% overhead. - LLVM SafeStack (split-stack model, sensitive locals on separate stack): ~0.1% overhead, very lightweight but only partial protection.
- Full software CFI (e.g., LLVM
-fsanitize=cfi): ~1% overhead, works on x86/Arm without CET.
IBT in software: there is no fast software equivalent. CFG (Windows) costs ~1-2% on each indirect call due to bitmap lookup. Clang’s -fsanitize=cfi-icall does class-hierarchy-aware CFI at ~1% but only for C++ virtual calls.
Net: hardware CET costs effectively zero on cycle-by-cycle benchmarks; the software equivalents cost 1-5%. The reason CET is exciting is that that overhead is now negative (it’s actually free).
§6 Mochi adaptation note
Like the PAC/BTI story, vm3-classic (the bytecode interpreter) doesn’t need shadow stacks at user-program scope: the Mochi user cannot smash a return address into the Mochi runtime in any way. The interpreter dispatch loop is the only RET user bytecode interacts with, and Mochi return values do not touch the C-level stack.
vm3jit is different. Once Mochi functions are lowered to native x86-64 / AArch64:
- Each JIT’d function’s prologue should be CET-compatible:
ENDBR64as first instruction (for x86-64 builds), standardCALL/RETsequence (so SHSTK applies transparently), no manual stack-pointer tricks that confuse SHSTK. - The vm3jit code-cache should be allocated via
mprotectsuch that the shadow-stack does not consider it a non-CET-compatible region. - The interpreter→JIT trampoline and JIT→interpreter return-slot must use the same calling convention so SHSTK’s call/return bookkeeping doesn’t desync.
Where vm3 currently falls short: as of May 2026, neither vm3 nor vm3jit emits ENDBR64 at JIT-function entries on x86-64, and the runtime is not built with -fcf-protection=full by default (Go toolchain has partial support since 1.22). The smallest gap-closer for MEP-41:
- Build the Go-side mochi binary with
-buildvcs=true -ldflags=-w -gcflags=-N -gcflags="-fcf-protection=full"(or wait for Go to enable by default). - JIT-emit
ENDBR64(x86-64) /BTI c(AArch64) as the first instruction of every JIT entry point. - Use standard
CALL/RETin JIT (no manualJMP-based tail calls into the interpreter; those defeat SHSTK).
This doesn’t touch the Cell layout (MEP-40) at all; it’s purely a JIT code-generation discipline. Reference: MEP-39 (the JIT).
§7 Open questions for MEP-41 design
- Does Go’s runtime cooperate well with userspace SHSTK as of 1.23+? Goroutine stack switches must update SSP; if Go’s
runtime.cgocallbackpath mis-handles SSP, we’ll have spurious#CPfaults. - Should vm3jit ever emit non-standard returns (tail-call-as-jump for inlined fast paths)? These break SHSTK; we may need a
notrack-equivalent or to abandon them. - On x86-64 platforms without CET (older AMD, pre-Tiger-Lake Intel), should vm3jit fall back to a software shadow-call-stack, or accept the gap?
- Apple Silicon doesn’t have CET (it has PAC instead — see file 06). Should the same JIT codepath use PAC where available and fall through to nothing where not, treating these as two non-overlapping platform features?
- The Synacktiv 2025 results show kernel SHSTK is still bypassable. Does the threat model we accept for Mochi include “the host kernel may have an SHSTK bypass”, and if so, what does the runtime do about it (nothing? alert? abort?)?