# x86_64 System V AMD64 ABI

## §1 Provenance

Canonical specifications:

- Official psABI (community maintained): https://gitlab.com/x86-psABIs/x86-64-ABI (the LaTeX source and PDF builds live here, currently revision 1.0+, with continuous incremental amendments tracked in git).
- Historical baseline PDF (v0.99): https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf.
- Intel 64 and IA-32 Architectures Software Developer's Manual (Intel SDM), Volumes 1, 2, 3, 4: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html.
- AMD64 Architecture Programmer's Manual, Vols 1 to 5: https://www.amd.com/en/search/documentation/hub.html?keyword=AMD64+Architecture+Programmer%27s+Manual.
- Intel APX Architecture Specification: https://cdrdv2-public.intel.com/784266/355828-intel-apx-spec.pdf (revised through 2025).
- Apple's notes on x86_64 ABI deviations: scattered through Xcode release notes; the canonical reference is the LLVM/clang Darwin x86_64 backend (lib/Target/X86/X86CallingConv.td).

## §2 Mechanism / specification

Integer/pointer arguments go in RDI, RSI, RDX, RCX, R8, R9 (in that order). Floating-point and SSE arguments go in XMM0 through XMM7. Beyond that, arguments spill to the stack in right-to-left order, each occupying 8-byte slots (or 16 for __m128, 32 for __m256). The return value sits in RAX (and RDX for 128-bit returns, XMM0/XMM1 for FP, or RAX+XMM0 for mixed-class structs).

Aggregates are classified per 8-byte chunk into INTEGER, SSE, or MEMORY classes (§3.2.3 of the psABI). Anything larger than 16 bytes, or anything with a non-trivial copy constructor, falls back to MEMORY and is passed by hidden pointer in RDI (which then shifts the rest of the arg sequence).

The stack must be 16-byte aligned immediately before a CALL, which means RSP is 8 modulo 16 at the entry of a function (the return address ate 8 bytes). A 32-byte alignment is required if an __m256 spills to stack; 64-byte alignment for __m512.

The red zone: 128 bytes below RSP are reserved as a leaf-function scratch area that signal handlers will not clobber. The kernel does not honor the red zone for interrupt frames, which is why kernel and `-mno-red-zone` exists.

Variadic functions require AL/RAX to contain the count of XMM registers used (0 to 8) before the call so callees know how to spill them.

Callee-saved registers: RBX, RBP, RSP, R12, R13, R14, R15. Caller-saved: everything else, including all XMM/YMM/ZMM.

## §3 Platform coverage (May 2026)

Linux x86_64 (every distribution since ~2003), FreeBSD/OpenBSD/NetBSD, illumos, Solaris, Haiku, and macOS through Rosetta 2 and on Intel Macs (the latter group is shrinking but still ships). The Linux ELF psABI is the canonical implementation. macOS additions on top of SysV:

- Symbols are prefixed with a leading underscore (`_main`, not `main`).
- Syscalls use `syscall` with class-prefixed numbers (`0x2000000 | nr`); user code generally calls libsystem rather than raw syscalls.
- The position-independent ABI is mandatory (PIE on by default since 10.7).
- Mach-O LC_BUILD_VERSION load command carries the deployment target. There is no equivalent in SysV-ELF; ELF uses .note sections and SONAME for similar plumbing.

FreeBSD on amd64 uses the SysV ABI exactly. OpenBSD diverges only in tightening: W^X enforcement, stack-protector mandatory, RELRO required.

## §4 Current status (May 2026)

The psABI gets minor revisions every few months on GitLab. Recent changes include AVX-512 classification rules, AMX tile types, and (in 2025) the Intel APX prefix encoding plus EGPRs.

Intel APX (Advanced Performance Extensions) added 16 new general-purpose registers (R16 to R31), a new REX2 prefix (0xD5), three-operand new-data-destination forms, conditional load/store/compare, NF (no-flags), zero-upper SETcc, PUSH2/POP2, and a 64-bit absolute jump. APX EGPRs are all caller-saved in the launch ABI. As of May 2026 the APX-aware silicon (Nova Lake on the client side, Diamond Rapids on the server side) is shipping or about to ship. GCC 14 added foundation support, GCC 15 added `-march=diamondrapids`, and Linux 6.16+ has KVM patches landed for guest APX state. XSAVE for APX reuses the deprecated MPX state area (XCR0[19]).

AVX10 is in the same wave: a versioned vector ISA that unifies AVX-512 capabilities across client and server. AVX10.2 ships with Diamond Rapids.

The red zone, syscall conventions, and the six-register integer convention have not changed.

## §5 Engineering cost for Mochi

A SysV backend is the cheapest first target. Mochi can:

1. Pick an existing Go assembler (Go's own asm via `cmd/internal/obj/x86`), an external `as`, or emit pure machine code into an ELF.
2. Use Go's `debug/elf` to write objects (with the elf.NewFile reader being more developed than the writer, but `cmd/link/internal/loadelf` exists).
3. Rely on system `ld`/`lld` for final linking, at least in Phase 1.

The must-have surface: integer arg classification, SSE arg classification, MEMORY return for big structs, 16-byte stack alignment, callee-saved register spilling, variadic AL convention. The nice-to-have: APX EGPRs, AVX-512 classification, x32 (sub)ABI. Mochi can entirely skip APX/AVX10/AMX in Phase 1 and still produce competitive binaries.

Cross-compiling from any host to x86_64-linux-gnu, x86_64-apple-darwin, x86_64-unknown-freebsd is well supported by Zig's CC frontend, LLVM, and Go's own assembler. Mochi's compiler is in Go, so calling out to `go tool asm` and `go tool link` is feasible.

## §6 Mochi adaptation note

In compiler3, the planned `backend/native/x86_64` package would consume the typed-IR produced after the arena-IR pass (per MEP-40 phase 1 work in `runtime/vm3`) and lower it to instructions. The 8-byte Cell handle from MEP-40 fits perfectly in a single GPR, so most Mochi values pass in RDI/RSI/RDX/RCX/R8/R9 without classification gymnastics. Aggregates wider than 16 bytes (lists, maps, strings as triplets) will need a small ABI lowering pass.

The runtime/vm3 arena allocator stays in-process; nothing about SysV constrains its memory layout. Stack frames for compiled functions should keep RBP threaded so SysV unwinders (and Go's runtime, if Mochi ever hosts) work.

## §7 Open questions for MEP-42

1. Does Mochi target SysV on Linux only, or also FreeBSD/OpenBSD in Phase 1? Phase 1 should at minimum include Linux x86_64 because it is the cheapest target with the largest test surface.
2. APX support: ship in Phase 1 (low value, narrow hardware) or defer? Recommend defer until 2027.
3. Linker choice: invoke system `ld`/`lld`, or vendor `mold`/`lld` source? Recommend invoking system linker in Phase 1 to ship faster, with a path to in-process linking later.
4. Red-zone usage: emit `-mno-red-zone` style code (safe under kernels) or assume userspace? Recommend userspace red-zone use, with a flag to disable.

