Skip to content

Intel CET Shadow Stack + IBT

Intel CET Shadow Stack + IBT

A hardware second stack that records return addresses, plus indirect-branch landing pads. Shipping on every modern Intel/AMD CPU, default-on in Windows 11, opt-in on Linux.

§1 Provenance

§2 Mechanism

CET has two architectural components:

  1. Shadow Stack (SHSTK). The CPU maintains a second stack at a virtual address held in SSP (Shadow Stack Pointer), mapped with a special page-table attribute (PTE.shadow=1) such that only special instructions (WRSS, RSTORSSP, INCSSP) can write to it. On CALL, the CPU pushes the return address to both the regular stack and the shadow stack; on RET, it pops both and compares — mismatch raises #CP (Control Protection) exception.
  2. Indirect Branch Tracking (IBT). Every legal target of an indirect JMP/CALL must begin with ENDBR64 (or ENDBR32). A notrack prefix can be used on jumps known to dispatch to non-instrumented code. Indirect branch into a non-ENDBR causes #CP.

Shadow stacks are per-thread; the OS lazily allocates one on the first arch_prctl(ARCH_SHSTK_ENABLE) call. The kernel can choose to enable shadow stack for itself separately (Windows does; Linux does as of 6.6).

Hardware: Intel Tiger Lake (11th gen, 2020) introduced SHSTK; Sapphire Rapids (4th gen Xeon, 2023) brought it to server. AMD Zen 3 (2020) introduced shadow stack. IBT is supported on the same generations.

§3 Threat model + guarantees

  • Backward-edge CFI: SHSTK fully closes naive stack-buffer-overflow → ROP chains, provided the kernel has shadow stack on too. An attacker who corrupts the regular stack now has to corrupt the shadow stack as well, but the shadow stack is page-protected so this requires a privileged write primitive.
  • Forward-edge CFI: IBT eliminates JOP gadgets that don’t begin with ENDBR. Mostly equivalent to BTI on Arm.
  • Not protected by SHSTK:
    • In-band control transfers (sigreturn-oriented programming, exception-handling chains) — recent CVEs in Windows kernel showed KiSwapStack and _C_specific_handler gadgets.
    • JIT-emitted code that doesn’t emit ENDBR at each entry point.
    • Data-only attacks (write-what-where targeting application data not return addresses).
  • Not protected by IBT on Windows: Windows specifically did not deploy IBT in production; it uses CFG (Control Flow Guard) bitmap instead. So Windows protection is SHSTK + CFG, not SHSTK + IBT.
  • Side channels: no direct CET-specific leak as of May 2026, but standard Spectre v1/v2 still applies to indirect branches that pass IBT.

§4 Production status (May 2026)

  • Windows 11: SHSTK exposed via “Kernel-mode Hardware-enforced Stack Protection” toggle in Windows Security UI (since 22H2). Apps must opt-in via /CETCOMPAT link flag or via SetProcessMitigationPolicy(ProcessUserShadowStackPolicy) at runtime. Windows 11 23H2 and 24H2 enable kernel SHSTK by default on supported hardware. Win11 does not enforce IBT (uses CFG).
  • Linux: SHSTK landed mainline in 6.6 (Oct 2023); fully usable interface stabilised by 6.8. Glibc 2.39 (Feb 2024) added userspace SHSTK enablement; it’s opt-in via GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK. As of May 2026, no major distro enables it by default for all binaries; selected hardened distros (Gentoo hardened, Alpine) ship it on. Linux does not enforce userspace IBT either as of 6.x — too many older binaries fail without ENDBR.
  • GCC/Clang: -fcf-protection=full emits both ENDBR and shadow-stack-compatible code (i.e., uses standard CALL/RET, since SHSTK is transparent).
  • Glibc 2.39 also added the dynamic loader’s verification that all dependent shared libraries are CET-compatible before enabling for a process; this is the chief reason it’s gated.
  • 2025 status (SSTIC paper): Synacktiv showed real-world Windows kernel SHSTK bypasses via the exception-handling and stack-swap paths, motivating ongoing work to extend the protection to those code paths. The mitigation is real but the kernel TCB has plenty of gadgets that don’t go through RET.

§5 Software emulation cost

SHSTK in software (e.g., LLVM -fsanitize=shadow-call-stack):

  • AArch64 shadow-call-stack reserves x18 and stores RA there; ~1-3% overhead, requires no hardware.
  • x86-64 shadow-call-stack via gs:-offset shadow region: also ~1-3% overhead.
  • LLVM SafeStack (split-stack model, sensitive locals on separate stack): ~0.1% overhead, very lightweight but only partial protection.
  • Full software CFI (e.g., LLVM -fsanitize=cfi): ~1% overhead, works on x86/Arm without CET.

IBT in software: there is no fast software equivalent. CFG (Windows) costs ~1-2% on each indirect call due to bitmap lookup. Clang’s -fsanitize=cfi-icall does class-hierarchy-aware CFI at ~1% but only for C++ virtual calls.

Net: hardware CET costs effectively zero on cycle-by-cycle benchmarks; the software equivalents cost 1-5%. The reason CET is exciting is that that overhead is now negative (it’s actually free).

§6 Mochi adaptation note

Like the PAC/BTI story, vm3-classic (the bytecode interpreter) doesn’t need shadow stacks at user-program scope: the Mochi user cannot smash a return address into the Mochi runtime in any way. The interpreter dispatch loop is the only RET user bytecode interacts with, and Mochi return values do not touch the C-level stack.

vm3jit is different. Once Mochi functions are lowered to native x86-64 / AArch64:

  • Each JIT’d function’s prologue should be CET-compatible: ENDBR64 as first instruction (for x86-64 builds), standard CALL/RET sequence (so SHSTK applies transparently), no manual stack-pointer tricks that confuse SHSTK.
  • The vm3jit code-cache should be allocated via mprotect such that the shadow-stack does not consider it a non-CET-compatible region.
  • The interpreter→JIT trampoline and JIT→interpreter return-slot must use the same calling convention so SHSTK’s call/return bookkeeping doesn’t desync.

Where vm3 currently falls short: as of May 2026, neither vm3 nor vm3jit emits ENDBR64 at JIT-function entries on x86-64, and the runtime is not built with -fcf-protection=full by default (Go toolchain has partial support since 1.22). The smallest gap-closer for MEP-41:

  1. Build the Go-side mochi binary with -buildvcs=true -ldflags=-w -gcflags=-N -gcflags="-fcf-protection=full" (or wait for Go to enable by default).
  2. JIT-emit ENDBR64 (x86-64) / BTI c (AArch64) as the first instruction of every JIT entry point.
  3. Use standard CALL/RET in JIT (no manual JMP-based tail calls into the interpreter; those defeat SHSTK).

This doesn’t touch the Cell layout (MEP-40) at all; it’s purely a JIT code-generation discipline. Reference: MEP-39 (the JIT).

§7 Open questions for MEP-41 design

  1. Does Go’s runtime cooperate well with userspace SHSTK as of 1.23+? Goroutine stack switches must update SSP; if Go’s runtime.cgocallback path mis-handles SSP, we’ll have spurious #CP faults.
  2. Should vm3jit ever emit non-standard returns (tail-call-as-jump for inlined fast paths)? These break SHSTK; we may need a notrack-equivalent or to abandon them.
  3. On x86-64 platforms without CET (older AMD, pre-Tiger-Lake Intel), should vm3jit fall back to a software shadow-call-stack, or accept the gap?
  4. Apple Silicon doesn’t have CET (it has PAC instead — see file 06). Should the same JIT codepath use PAC where available and fall through to nothing where not, treating these as two non-overlapping platform features?
  5. The Synacktiv 2025 results show kernel SHSTK is still bypassable. Does the threat model we accept for Mochi include “the host kernel may have an SHSTK bypass”, and if so, what does the runtime do about it (nothing? alert? abort?)?