Skip to content

The Compile-Time vs Runtime Trade-Off

A conceptual essay on where Mochi MEP-42 should sit on the curve from "compile slowly, run fast" (LLVM -O3) through "compile and run at medium speed" (Cranelift, B3) to "compile instantly, run okay" (copy-and-patch, Sparkplug). Includes a back-of-envelope budget derived from MEP-17 and MEP-23 bench numbers.

§1 Provenance

  • The trade-off is folklore across the JIT community; quantified in many published papers and blog posts. Key data points:
  • Mochi-specific bench numbers: MEP-17 (vm2 micro-benchmarks) and MEP-23 (compiler1 codegen budgets), referenced from /Users/apple/notes/Spec/ (not enumerated here since they are internal).

§2 Technique / contribution

The conceptual model:

quality of generated code (runtime perf)
  ^
  |                                          * LLVM-O3, B3, TurboFan
  |                                    * Cranelift, FTL
  |                              * QBE
  |                        * Cranelift-fast, DFG
  |                  * Liftoff, Maglev
  |             * Sparkplug, JSC Baseline
  |       * Copy-and-Patch
  |  * Interpreter (no compile)
  +-------------------------------------------------------> compile speed

The right-hand side is “compile instantly, run okay”. The left-hand side is “compile slowly, run fast”. MEP-42 is a phase-1 effort; the user explicitly said “naive”. That puts us on the right-hand side.

Three concrete choices to evaluate:

  1. Copy-and-patch (CPython 3.13 model): compile speed ~10 MB/s of source equivalent; code ~2x slower than LLVM -O2; ~2-3x faster than vm3 interpreter for hot loops.
  2. Per-opcode template JIT (Sparkplug, Liftoff style): compile speed ~5 MB/s; code ~1.5-2x slower than optimized; ~3-5x faster than vm3 interpreter.
  3. chibicc-style single-pass AOT to GAS asm + shell out to cc: compile speed limited by cc (~1-2 MB/s); code ~3-5x slower than optimized (but cc does some opt for us); ~3x faster than vm3 interpreter.
  4. QBE backend (textual SSA + shell out): compile speed ~3-5 MB/s; code ~30% slower than LLVM -O2; ~5x faster than vm3 interpreter.

§3 Where it shines, where it fails

The trade-off curve is not a single number; it depends on workload:

Short-lived programs (CLI tools, scripts, build helpers):

  • Compile time dominates total time.
  • Copy-and-patch and Sparkplug shine.
  • chibicc-style AOT loses because cc invocation is slow.

Long-running programs (servers, batch jobs, data pipelines):

  • Runtime dominates.
  • QBE, Cranelift, or LLVM shine.
  • Copy-and-patch and Sparkplug leave performance on the table.

Mixed workloads (typical for Mochi: interactive REPL plus running production handlers):

  • A tiered strategy wins: ship a baseline JIT for cold code, an optimizing tier for hot code.
  • But phase 1 of MEP-42 cannot be tiered; we pick one point.

§4 Status (May 2026)

  • Every major dynamic-language VM is tiered (V8, JSC, SpiderMonkey, BEAM, HotSpot, OpenJ9, .NET CoreCLR, Wasmtime).
  • Recent research (Lesbre/Lemerre PLDI 2024) shows even baseline JITs can be 30% faster with cheap abstract interpretation.
  • For static languages with AOT-only output (Go, Rust, OCaml), the trade-off is one-sided: only runtime perf matters because compile time is amortized over many runs.
  • Mochi sits between these worlds. The vm3 interpreter exists, so we have a runtime tier; MEP-42 is asking us to add a compile-to-native tier.

§5 Engineering cost for Mochi

Back-of-envelope budget from MEP-17 and MEP-23:

  • vm3 interpreter executes Mochi bench programs at roughly the speed of CPython 3.13 (within 1.5x, based on MEP-17 numbers).
  • Copy-and-patch JIT for CPython yielded 2-9% over interpreter. Mochi would likely see similar (let’s call it 5-15%).
  • A per-opcode template JIT (Sparkplug-style) would likely yield 30-50% improvement, based on V8’s measured 5-15% on optimized JS interpreter that is much better than vm3.
  • A QBE-via-textual-SSA emission would likely yield 2-5x improvement on numeric workloads.
  • An LLVM/MLIR-based optimizing tier would yield 5-10x on hot loops.

Compile-time budget: MEP-23 set an informal target of “compile a 10,000-line Mochi program in under 1 second on a 2024 laptop.” This rules out LLVM/MLIR for phase 1. It comfortably fits copy-and-patch and template JIT; it tightly fits QBE; it slightly exceeds chibicc-style shell-to-cc (cc on 10k LOC is closer to 2-5 seconds).

Recommendation by workload:

  • For Mochi REPL + interactive use: copy-and-patch (fastest compile, decent run).
  • For Mochi scripts: template JIT (compile once per run, decent code).
  • For Mochi long-running services: QBE (slower compile, 5x better code).
  • For Mochi AOT binaries we ship to users: chibicc-style + cc -O2. Slow compile, fastest code that doesn’t require LLVM.

§6 Mochi adaptation note

  • runtime/vm3/ is the existing baseline (no compile cost; interpreter speed).
  • A phase-1 naive emitter should target the fastest compile time point on the curve. That argues for copy-and-patch.
  • A phase-2 optimizing emitter should aim for QBE-equivalent quality.
  • The vm3 interpreter remains the always-available fallback for unsupported ops or platforms.

§7 Open questions for MEP-42

  • What is the right phase-1 point on the curve? My recommendation (see naive/00_naive_summary.md): copy-and-patch.
  • Do we want to support multiple emitters simultaneously, selected per file or per function? (Yes, eventually.)
  • How do we measure “naive code is good enough”? What is the Mochi MEP-42 acceptance benchmark?
  • Should we bench against CPython 3.13 (interpreter), CPython 3.13 with JIT, Go, and Rust on Mochi corpus equivalents?
  • Tiering: phase 1 is single-tier. When do we add a second tier?

§8 References