x86_64 System V AMD64 ABI

§1 Provenance

Canonical specifications:

Official psABI (community maintained): https://gitlab.com/x86-psABIs/x86-64-ABI (the LaTeX source and PDF builds live here, currently revision 1.0+, with continuous incremental amendments tracked in git).
Historical baseline PDF (v0.99): https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf.
Intel 64 and IA-32 Architectures Software Developer's Manual (Intel SDM), Volumes 1, 2, 3, 4: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html.
AMD64 Architecture Programmer's Manual, Vols 1 to 5: https://www.amd.com/en/search/documentation/hub.html?keyword=AMD64+Architecture+Programmer%27s+Manual.
Intel APX Architecture Specification: https://cdrdv2-public.intel.com/784266/355828-intel-apx-spec.pdf (revised through 2025).
Apple's notes on x86_64 ABI deviations: scattered through Xcode release notes; the canonical reference is the LLVM/clang Darwin x86_64 backend (lib/Target/X86/X86CallingConv.td).

§2 Mechanism / specification

Integer/pointer arguments go in RDI, RSI, RDX, RCX, R8, R9 (in that order). Floating-point and SSE arguments go in XMM0 through XMM7. Beyond that, arguments spill to the stack in right-to-left order, each occupying 8-byte slots (or 16 for __m128, 32 for __m256). The return value sits in RAX (and RDX for 128-bit returns, XMM0/XMM1 for FP, or RAX+XMM0 for mixed-class structs).

Aggregates are classified per 8-byte chunk into INTEGER, SSE, or MEMORY classes (§3.2.3 of the psABI). Anything larger than 16 bytes, or anything with a non-trivial copy constructor, falls back to MEMORY and is passed by hidden pointer in RDI (which then shifts the rest of the arg sequence).

The stack must be 16-byte aligned immediately before a CALL, which means RSP is 8 modulo 16 at the entry of a function (the return address ate 8 bytes). A 32-byte alignment is required if an __m256 spills to stack; 64-byte alignment for __m512.

The red zone: 128 bytes below RSP are reserved as a leaf-function scratch area that signal handlers will not clobber. The kernel does not honor the red zone for interrupt frames, which is why kernel and -mno-red-zone exists.

Variadic functions require AL/RAX to contain the count of XMM registers used (0 to 8) before the call so callees know how to spill them.

Callee-saved registers: RBX, RBP, RSP, R12, R13, R14, R15. Caller-saved: everything else, including all XMM/YMM/ZMM.

§3 Platform coverage (May 2026)

Linux x86_64 (every distribution since ~2003), FreeBSD/OpenBSD/NetBSD, illumos, Solaris, Haiku, and macOS through Rosetta 2 and on Intel Macs (the latter group is shrinking but still ships). The Linux ELF psABI is the canonical implementation. macOS additions on top of SysV:

Symbols are prefixed with a leading underscore (_main, not main).
Syscalls use syscall with class-prefixed numbers (0x2000000 | nr); user code generally calls libsystem rather than raw syscalls.
The position-independent ABI is mandatory (PIE on by default since 10.7).
Mach-O LC_BUILD_VERSION load command carries the deployment target. There is no equivalent in SysV-ELF; ELF uses .note sections and SONAME for similar plumbing.

FreeBSD on amd64 uses the SysV ABI exactly. OpenBSD diverges only in tightening: W^X enforcement, stack-protector mandatory, RELRO required.

§4 Current status (May 2026)

The psABI gets minor revisions every few months on GitLab. Recent changes include AVX-512 classification rules, AMX tile types, and (in 2025) the Intel APX prefix encoding plus EGPRs.

Intel APX (Advanced Performance Extensions) added 16 new general-purpose registers (R16 to R31), a new REX2 prefix (0xD5), three-operand new-data-destination forms, conditional load/store/compare, NF (no-flags), zero-upper SETcc, PUSH2/POP2, and a 64-bit absolute jump. APX EGPRs are all caller-saved in the launch ABI. As of May 2026 the APX-aware silicon (Nova Lake on the client side, Diamond Rapids on the server side) is shipping or about to ship. GCC 14 added foundation support, GCC 15 added -march=diamondrapids, and Linux 6.16+ has KVM patches landed for guest APX state. XSAVE for APX reuses the deprecated MPX state area (XCR0[19]).

AVX10 is in the same wave: a versioned vector ISA that unifies AVX-512 capabilities across client and server. AVX10.2 ships with Diamond Rapids.

The red zone, syscall conventions, and the six-register integer convention have not changed.

§5 Engineering cost for Mochi

A SysV backend is the cheapest first target. Mochi can:

Pick an existing Go assembler (Go's own asm via cmd/internal/obj/x86), an external as, or emit pure machine code into an ELF.
Use Go's debug/elf to write objects (with the elf.NewFile reader being more developed than the writer, but cmd/link/internal/loadelf exists).
Rely on system ld/lld for final linking, at least in Phase 1.

The must-have surface: integer arg classification, SSE arg classification, MEMORY return for big structs, 16-byte stack alignment, callee-saved register spilling, variadic AL convention. The nice-to-have: APX EGPRs, AVX-512 classification, x32 (sub)ABI. Mochi can entirely skip APX/AVX10/AMX in Phase 1 and still produce competitive binaries.

Cross-compiling from any host to x86_64-linux-gnu, x86_64-apple-darwin, x86_64-unknown-freebsd is well supported by Zig's CC frontend, LLVM, and Go's own assembler. Mochi's compiler is in Go, so calling out to go tool asm and go tool link is feasible.

§6 Mochi adaptation note

In compiler3, the planned backend/native/x86_64 package would consume the typed-IR produced after the arena-IR pass (per MEP-40 phase 1 work in runtime/vm3) and lower it to instructions. The 8-byte Cell handle from MEP-40 fits perfectly in a single GPR, so most Mochi values pass in RDI/RSI/RDX/RCX/R8/R9 without classification gymnastics. Aggregates wider than 16 bytes (lists, maps, strings as triplets) will need a small ABI lowering pass.

The runtime/vm3 arena allocator stays in-process; nothing about SysV constrains its memory layout. Stack frames for compiled functions should keep RBP threaded so SysV unwinders (and Go's runtime, if Mochi ever hosts) work.

§7 Open questions for MEP-42

Does Mochi target SysV on Linux only, or also FreeBSD/OpenBSD in Phase 1? Phase 1 should at minimum include Linux x86_64 because it is the cheapest target with the largest test surface.
APX support: ship in Phase 1 (low value, narrow hardware) or defer? Recommend defer until 2027.
Linker choice: invoke system ld/lld, or vendor mold/lld source? Recommend invoking system linker in Phase 1 to ship faster, with a path to in-process linking later.
Red-zone usage: emit -mno-red-zone style code (safe under kernels) or assume userspace? Recommend userspace red-zone use, with a flag to disable.