Cranelift as a Code-Generation Backend for Mochi

§1 Provenance

Project home: https://cranelift.dev/
Source: in the Wasmtime monorepo, https://github.com/bytecodealliance/wasmtime/tree/main/cranelift
ISLE language reference: https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/isle/docs/language-reference.md
2022 roadmap RFC (most recent formal): https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-roadmap-2022.md
ISLE RFC: https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-isel-isle-peepmatic.md
E-graph midend RFC: https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-egraph.md
Wikipedia summary: https://en.wikipedia.org/wiki/Cranelift
"Cranelift Progress in 2022": https://bytecodealliance.org/articles/cranelift-progress-2022
Bjorn3 progress reports (cg_clif): https://bjorn3.github.io/

§2 Mechanism

Cranelift consumes CLIF, a target-independent SSA IR with block parameters (no phi nodes) and explicit memory operations. The pipeline is: legalization, an optional e-graph-based midend (constant folding, redundant load elimination, GVN, simple peepholes, enabled by default since 2023), instruction selection through ISLE (a Lisp-shaped term-rewriting DSL that compiles to Rust match trees), the regalloc2 register allocator (adapted from IonMonkey, https://github.com/bytecodealliance/regalloc2), prologue/epilogue insertion, and direct emission to a MachBuffer of raw bytes plus relocations. Output is in-memory machine code suitable for JIT or for being wrapped in an object file via the cranelift-object crate.

Cranelift was designed for fast compile times, not for the best possible code. There is no inlining, no loop vectorization, no scheduling beyond simple peepholes. Generated code is typically 1.5-2x slower than LLVM -O2 but compiles 5-10x faster.

§3 Target coverage (May 2026)

x86_64 (SysV and Win64): production-quality, the primary Wasmtime target.
AArch64 (Linux, macOS, Windows): production-quality.
RISC-V RV64GC: production-quality since 2023.
IBM s390x (z/Architecture): production-quality (IBM-funded).
32-bit Arm: not supported; not on the roadmap.
WebAssembly: not a Cranelift target. Cranelift consumes Wasm via Wasmtime and emits native code.

Object formats: ELF and Mach-O via cranelift-object; PE/COFF support exists but is less battle-tested.

DWARF: line-table support yes; type info partial. cg_clif emits enough for gdb to do source-level stepping.

§4 Production / language adoption status (May 2026)

Wasmtime (https://wasmtime.dev): primary user, ships major releases roughly monthly. Wasmtime v35 (per https://bytecodealliance.org/articles) added AArch64 support to Winch, the single-pass tier-0 baseline compiler complementary to Cranelift.
Firefox SpiderMonkey: uses Cranelift for the Wasm baseline tier on some platforms.
rustc_codegen_cranelift (cg_clif): alternative rustc backend (https://github.com/rust-lang/rustc_codegen_cranelift). 2025 was a milestone year: exception/unwind support landed, inline asm became stable, AArch64 macOS shipped as a rustup component, the formal "production-ready cranelift backend" Rust Project Goal targets full readiness on Linux/macOS x86_64+aarch64 (https://rust-lang.github.io/rust-project-goals/2025h2/production-ready-cranelift.html). Performance: roughly 20% reduction in codegen time vs LLVM, 5% speedup on clean builds, plus 10-50% extra from enabling lld.
Lucet, Fastly Compute@Edge: heavy production users for Wasm AOT.

Active maintainership is healthy, Bytecode Alliance funding, ~weekly meetings. License is Apache 2.0 with LLVM Exceptions.

§5 Engineering cost for Mochi

Binary footprint: A cranelift-codegen + cranelift-frontend + cranelift-jit static blob is roughly 15-25 MB, an order of magnitude smaller than LLVM.
Build complexity: Cranelift is pure Rust with no C dependencies. The hard problem for a Go-hosted Mochi is the cgo bridge: there is no maintained Go binding. Options are (a) write a thin C++ wrapper over the cranelift-jit C API and FFI to it via cgo, (b) run Cranelift in a child process and exchange CLIF text, or (c) link a Rust shared library (staticlib) and call it via cgo. None of these are turnkey.
License: Apache 2.0 with LLVM Exceptions.
Cross-compilation: Excellent: a single Cranelift build can target every supported triple.
Debugging: Adequate. Line tables work; complex variable inspection is limited.
Runtime startup: Sub-millisecond engine setup; JIT compiles a function in microseconds to low-millisecond range.

§6 Mochi adaptation note

Cranelift's CLIF is almost a direct match for the compiler3 IR shape (/Users/apple/github/mochilang/mochi/compiler3/ir): SSA values, basic blocks with block parameters, explicit memory ops. The compiler3 register allocator under /Users/apple/github/mochilang/mochi/compiler3/regalloc becomes obsolete (regalloc2 is better than anything Mochi will write); the vm3 op table in /Users/apple/github/mochilang/mochi/runtime/vm3/op.go becomes a set of CLIF idioms. The integration question is purely the Go-to-Rust bridge, not the IR mapping.

Mochi's existing runtime/jit/vm2jit (/Users/apple/github/mochilang/mochi/runtime/jit/vm2jit/lower.go) uses twitchyliquid64/golang-asm to handcraft amd64+arm64; Cranelift would replace that lower step with a higher-level lower-then-let-Cranelift-finish flow, at the cost of cgo.

§7 Open questions for MEP-42

Is cgo acceptable for Mochi? If no, Cranelift is essentially out unless we run it as a subprocess.
Subprocess vs in-process? A subprocess design (Mochi emits .clif, invokes wasmtime compile --target=... or a custom mochi-cranelift worker) avoids cgo at the cost of process-spawn latency.
Rust toolchain on every Mochi build host? Even a staticlib requires cargo at build time. This is a step change in Mochi's build prerequisites.
Should Phase 1 instead go via Wasm + Wasmtime AOT? Cranelift's biggest user is Wasmtime; a Mochi-to-Wasm path (see 12_wasmtime_aot.md) gives us Cranelift's codegen quality for free, without ever touching the Cranelift API directly.
Code quality gap vs LLVM: Cranelift will be 1.5-2x slower than LLVM -O2 on tight loops. Is that acceptable for Phase 1? Almost certainly yes, given Mochi's current baseline is the vm3 interpreter.