1714. Port CPython's Tools/cases_generator into gopy
Ground rule
Same rule as 1704 / 1705 / 1708 / 1712 / 1713. Vendor full
subsystems, file by file. No partial slices, no name-only shims.
This spec is the most aggressive application of that rule yet:
every opcode in gopy, every cache layout, every stack effect,
every dispatch arm, every uop body gets re-rooted onto CPython's
own DSL. After this spec lands, vm/eval_*.go,
specialize/*.go cache-access calls, compile/opcode_caches.go,
and compile/opcodes_gen.go are all generated from the same
inputs CPython compiles itself from. The hand-rolled dispatch
loop, the hand-counted cache cells, and the hand-mirrored stack
effect tables go away.
This spec pauses spec 1713. Byte-equality work resumes once 1714 lands, because every remaining 1713 row (codegen audit, flowgraph audit, marshal port, .pyc parity) is downstream of the same single-source-of-truth question: gopy can only match CPython's compiled output if both sides agree on opcode numbering, cache widths, and stack effects, and gopy's current copies of those tables are hand-maintained and divergent. The LOAD_GLOBAL cache cell-4 vs cell-1 bug that surfaced during 1713 P2 work is the canonical example: the specializer wrote past the cache and overwrote the next opcode's first byte; the VM read from the same out-of-bounds slot, so both sides "agreed" on a wrong layout for months. The bug only surfaced when an unrelated change exposed a codepath where the next instruction happened to be PUSH_NULL (opcode 33), got rewritten as RESERVED (17), and CALL popped a tuple iterator from the wrong stack slot. There is no test that catches this class of bug today. There is no design pattern in gopy that prevents it. There is in CPython, and it is Tools/cases_generator.
Goal
tools/regen-cases.sh
git diff --exit-code
# CI fails if any generated file is out of date
Concretely, the deliverable is a generator under
tools/cases_generator/ that:
- Reads CPython 3.14.5's
Python/bytecodes.c,Python/optimizer_bytecodes.c, andInclude/internal/pycore_code.hfrom the vendored copy undertools/cases_generator/inputs/. - Emits a fixed set of Go files under
compile/,specialize/, andvm/named*_gen.go. - Is byte-for-byte reproducible: running
tools/regen-cases.shon a clean checkout leaves the tree unchanged.
Spec done = every opcode dispatch arm in gopy is generated, every
cache access goes through a typed accessor whose layout is
generated, every stack effect comes from the generated metadata
table, every family / deopt relation is generated, and the
hand-rolled equivalents are deleted. A reproducibility gate
(test/gate/cases_generator_reproducibility_test.go) keeps it
that way.
Why this spec exists
vm/eval_*.go is hand-rolled. specialize/*.go is hand-rolled.
compile/opcode_caches.go is hand-rolled. They were ported one
opcode at a time from CPython 3.14's Python/bytecodes.c and
related headers, with each port translating both sides
(specializer write, VM read, compile-side cache size, stack
effect, family membership) separately. Five places per opcode
that all have to agree.
CPython does not have this problem. Every opcode in CPython is
defined exactly once in Python/bytecodes.c using a small DSL
(inst, op, macro, family, pseudo, tier1, tier2,
label). A pipeline of generators under Tools/cases_generator/
then emits:
| CPython output file | What it drives |
|---|---|
Python/generated_cases.c.h | Tier-1 dispatch bodies inside _PyEval_EvalFrameDefault |
Python/executor_cases.c.h | Tier-2 uop bodies inside the executor loop |
Include/internal/pycore_opcode_metadata.h | Stack effects, cache sizes, names, family table, deopt map |
Include/internal/pycore_uop_metadata.h | Uop names, flags, output effects |
Include/opcode_ids.h | Opcode numeric IDs |
Python/opcode_targets.h | Computed-goto dispatch table |
Lib/_opcode_metadata.py | Python-side mirror used by dis |
The cache layout (_PyLoadGlobalCache { uint16_t counter; uint16_t index; uint16_t module_keys_version; uint16_t builtin_keys_version; } in Include/internal/pycore_code.h:117)
is paired with the inst(LOAD_GLOBAL_MODULE, ...) body in
Python/bytecodes.c via a _PyLoadGlobalCache *cache = ...
declaration the generator parses out of the body. The generator
enforces the pairing. Drift is a build error.
Hand-porting throws this out. We have, today, in gopy:
compile/opcode_caches.godeclaringLOAD_GLOBAL: 4codeunits.specialize/load_global.gowriting cells {2,3,4} (until 1714 P0 prework fixed it to {1,2,3}).vm/eval_specialized_load_global.goreading cell 4 (until the same fix moved it to cell 1).objects/dict.goexposingGetKeysVersion().compile/opcodes_gen.gonumbering LOAD_GLOBAL_MODULE independently of CPython's opcode_ids.h.
Five files. Five sources of truth. The same bug class is latent in LOAD_ATTR_INSTANCE_VALUE, LOAD_ATTR_SLOT, STORE_ATTR_INSTANCE_VALUE, CALL_PY_EXACT_ARGS, BINARY_SUBSCR_LIST_INT, and every other specialized arm. We have not seen the bugs yet because the inputs that trigger them have not been exercised. The 1712 specializer audit and the 1713 byte-equality gate are both going to surface them, one by one, the slow way. Or we can port the generator and delete the class.
The same argument applies to tier-2 uops. Spec 1712 hand-ported 14
of ~285 uops. Each one is a fresh translation from
Python/optimizer_bytecodes.c. Without the generator, every uop
is a fresh chance to drift from the tier-1 body of the same name.
CPython's generator emits both sides from the same source.
CPython architecture
CPython's pipeline, with line counts taken from a fresh
$HOME/cpython-314 clone at v3.14.5:
Python/bytecodes.c 5549 lines
├── 92 inst() tier-1 + tier-2 fused body
├── 145 op() tier-2-only or composable
├── 109 macro() composition of op + cache cells
├── 17 family() specialization families
├── 11 pseudo() compiler-only synthetics
└── 3 label() shared error-handler labels
Python/optimizer_bytecodes.c 1107 lines
├── op() bodies that override the tier-1 op() of the same name
└── used for tier-2 abstract interpretation (sym values, guards)
Include/internal/pycore_code.h cache structs per opcode family
Include/opcode_ids.h opcode numeric IDs (generated)
Include/internal/pycore_opcode_metadata.h per-opcode metadata (generated)
Tools/cases_generator/ 5811 lines
├── lexer.py 395 lines tokenizer for the C-with-DSL input
├── plexer.py 124 lines peekable lexer wrapper
├── parser.py 78 lines thin entry point on top of parsing.py
├── parsing.py 743 lines real parser; produces inst/op/macro/family AST
├── analyzer.py 1207 lines resolves macros, computes effects, walks bodies
├── stack.py 737 lines stack-effect tracker; emits push/pop sequences
├── cwriter.py 179 lines C output sink with indent tracking
├── generators_common.py 708 lines body emission shared between tier-1 / tier-2
├── tier1_generator.py 306 lines emits generated_cases.c.h
├── tier2_generator.py 228 lines emits executor_cases.c.h
├── optimizer_generator.py 244 lines emits abstract interpreter cases
├── opcode_id_generator.py 63 lines emits opcode_ids.h
├── opcode_metadata_generator.py 418 lines emits pycore_opcode_metadata.h
├── py_metadata_generator.py 95 lines emits Lib/_opcode_metadata.py
├── target_generator.py 94 lines emits opcode_targets.h (computed-goto)
├── uop_id_generator.py 79 lines emits pycore_uop_ids.h
└── uop_metadata_generator.py 98 lines emits pycore_uop_metadata.h
The full toolchain is 5811 lines of Python. The inputs are 6656 lines of C (bytecodes.c + optimizer_bytecodes.c). The C-side generated output across all targets is roughly 18000 lines.
Five concepts carry most of the weight, and the gopy port must preserve all of them:
Stack effects. Every inst() declaration looks like
inst(LOAD_FAST, (-- value)) or
inst(BINARY_OP, (lhs, rhs -- res)). The names and order on each
side declare the stack inputs (popped) and outputs (pushed). The
generator uses these to emit PEEK, STACK_SHRINK, STACK_GROW,
and the local-variable initialization that gives the C body a
typed binding for each input/output. stack.py tracks the running
effect across a macro() composition and emits the minimum-cost
push/pop sequence.
Cache cells. A macro(LOAD_GLOBAL) = unused/1 + counter/1 + globals_version/2 + builtins_version/1 + _LOAD_GLOBAL + _PUSH_NULL_CONDITIONAL declares the cache layout inline. The
/N suffix declares how many codeunits each cell occupies. The
generator computes the offset of each named cell, threads a
_PyLoadGlobalCache *cache = (_PyLoadGlobalCache *)next_instr; at
the top of the body, and rewrites cache->index references to
the right offset in the codeunit stream. The struct in
pycore_code.h matches by name and order. Drift is a compile
error in CPython.
Families and deopt. family(LOAD_GLOBAL, INLINE_CACHE_ENTRIES_LOAD_GLOBAL) = { LOAD_GLOBAL_MODULE, LOAD_GLOBAL_BUILTIN }; declares that
LOAD_GLOBAL's specialized arms are LOAD_GLOBAL_MODULE and
LOAD_GLOBAL_BUILTIN, all sharing the same cache size. The
generator emits the deopt map: a specialized arm that hits
DEOPT_IF(cond) jumps back to the family parent without changing
the cache layout. This is the mechanism gopy implements ad-hoc in
specialize/deopt.go today.
DEOPT_IF / EXIT_IF / ERROR_IF. These three macros in the
body desugar into the three exit paths the generator must emit:
deopt back to the parent opcode, exit the tier-2 trace, or jump
to the per-opcode error label. The generator knows which macro is
legal in which context (DEOPT_IF only inside specialized arms,
EXIT_IF only in tier-2, ERROR_IF anywhere) and emits the right
restoration code (stack rollback, refcount adjustment) for each.
Tier-1 / tier-2 fusion. inst(NAME, ...) declares a body that
serves both tiers. op(NAME, ...) declares a tier-2-only body or
a tier-1-only body, depending on whether it's referenced by a
macro(). optimizer_bytecodes.c overrides selected op()
bodies with abstract-interpretation versions used during
optimization. The generator emits the tier-1 case, the tier-2 uop
body, and the optimizer-case body from the same source. Spec
1712's hand-rolled approach has all three drift independently.
gopy current state (2026-05-16)
| Layer | gopy file(s) | What it should generate from | Status |
|---|---|---|---|
| Opcode IDs | compile/opcodes_gen.go | Include/opcode_ids.h | hand-rolled, generated-name notwithstanding. Source-of-truth: a Python script we run by hand, not the DSL. |
| Cache widths | compile/opcode_caches.go | pycore_opcode_metadata.h _PyOpcode_Caches[] | hand-rolled. 4 codeunits for LOAD_GLOBAL, etc. |
| Stack effects | inline in each eval_*.go arm | pycore_opcode_metadata.h _PyOpcode_num_popped/num_pushed | hand-rolled, no cross-check. |
| Cache layouts (struct) | specialize/cache.go SetCacheCell / CacheCell | pycore_code.h _Py<Op>Cache structs | hand-rolled offsets. Cell index is a magic number at every call site. |
| Tier-1 dispatch | vm/eval_simple.go | Python/generated_cases.c.h | hand-rolled switch statement. |
| Tier-1 specialized arms | vm/eval_specialized_*.go | Python/generated_cases.c.h (same cases) | hand-rolled. |
| Tier-2 uops | vm/uops/*.go (per 1712) | Python/executor_cases.c.h | 14 of ~285 hand-rolled. |
| Family table | specialize/quicken.go | pycore_opcode_metadata.h _PyOpcode_Caches+family arrays | hand-rolled. |
| Deopt map | specialize/deopt.go | pycore_opcode_metadata.h _PyOpcode_Deopt[] | hand-rolled. |
| Specializer skeletons | specialize/*.go per-family | DSL family declarations | hand-rolled. |
Every row above is in scope. By the end of this spec, every "hand-rolled" becomes "generated", or the row gets explicitly carved out with a documented reason.
Files in scope
Sources of truth live under /Users/apple/cpython-314/ and are
mirrored into tools/cases_generator/inputs/ so the generator
runs hermetically. Every file below is ported in full, with
// CPython: <file>:<line> <function> citations on the Go
emitters and on any non-trivial bridging glue.
| # | CPython source | gopy target | Why |
|---|---|---|---|
| A | Tools/cases_generator/lexer.py | tools/cases_generator/lexer.py (vendored verbatim) | DSL tokenizer. No Go port. Run under host Python 3.14. |
| B | Tools/cases_generator/plexer.py | tools/cases_generator/plexer.py (vendored) | Peekable wrapper. |
| C | Tools/cases_generator/parser.py + parsing.py | vendored | DSL parser. Produces Inst, Op, Macro, Family, Pseudo AST nodes. |
| D | Tools/cases_generator/analyzer.py | vendored | Resolves macros, computes stack effects, walks bodies for DEOPT_IF/ERROR_IF/EXIT_IF. |
| E | Tools/cases_generator/stack.py | vendored | Stack-effect tracker; emits push/pop sequences. |
| F | Tools/cases_generator/cwriter.py | vendored and re-implemented as gowriter.py | C writer + a Go writer sharing the same indent-tracking and emit API. |
| G | Tools/cases_generator/generators_common.py | vendored, plus a go_generators_common.py companion | Body emission. The companion handles Go-specific macro expansion (DEOPT_IF → return 0, false, ERROR_IF → return e.raise(err), etc). |
| H | Tools/cases_generator/opcode_id_generator.py | + gopy_opcode_id_generator.py | New emitter targeting compile/opcode_ids_gen.go. |
| I | Tools/cases_generator/opcode_metadata_generator.py | + gopy_opcode_metadata_generator.py | New emitter targeting compile/opcode_metadata_gen.go (replaces compile/opcode_caches.go). |
| J | Tools/cases_generator/uop_id_generator.py + uop_metadata_generator.py | + Go companions | Tier-2 uop tables. |
| K | Tools/cases_generator/tier1_generator.py | + gopy_tier1_generator.py | Emits vm/eval_dispatch_gen.go: the dispatch switch + per-opcode body harness. |
| L | Tools/cases_generator/tier2_generator.py | + gopy_tier2_generator.py (landed) | Emits optimizer/uops_dispatch_gen.go + optimizer/uops_stubs_gen.go: dispatch fan-out on *Tier2State plus one StatusDeopt stub per Tier-2-viable uop whose body lives inline as a //-prefixed C block. The retired tools/uops_gen/tier2_generator.go Go duplicate is removed. The Phase L landing intentionally keeps the outputs in optimizer/ (not vm/eval_uops_gen.go) because *Tier2State lives there; lifting them into vm/ is left to a follow-up phase. |
| M | Tools/cases_generator/optimizer_generator.py | + gopy_optimizer_generator.py (landed) | Emits optimizer/optimizer_cases_gen.go: one // comment block per Tier-2-viable uop carrying the abstract-interp C body (overlaid optimizer_bytecodes.c on bytecodes.c, falling back to emit_default when no override). The hand-written symbolic interpreter in optimizer/symbols.go + optimizer/analysis.go uses these blocks as the porting spec; when dispatch-by-uop lands (phase 7+), bodies translate to Go one-by-one and analysis.go shrinks. The output lands in optimizer/ instead of the originally-planned compile/ because JitOptSymbol/JitOptContext live there. The retired tools/uops_gen/optimizer_generator.go Go duplicate plus the optimizer-cases mode in tools/uops_gen/main.go are removed. |
| N | Tools/cases_generator/target_generator.py | (not ported) | CPython-specific computed-goto. Go's switch is fine. Documented carve-out. |
| O | Tools/cases_generator/py_metadata_generator.py | (vendored only) | Emits Lib/_opcode_metadata.py; gopy already vendors that file via 1710 T5.1. No regeneration needed; we ship CPython's. |
| P | Python/bytecodes.c (v3.14.5) | tools/cases_generator/inputs/bytecodes.c | The single source. Frozen per CPython tag; bumped together with 1707 sync. |
| Q | Python/optimizer_bytecodes.c (v3.14.5) | tools/cases_generator/inputs/optimizer_bytecodes.c | Tier-2 source. |
| R | Include/internal/pycore_code.h | tools/cases_generator/inputs/pycore_code.h | Cache struct definitions. Parsed by a new cache_struct_parser.py to emit specialize/cache_layouts_gen.go. |
Output Go files (all *_gen.go, all under generator control,
none hand-edited):
| Output file | Lines (estimated) | Replaces |
|---|---|---|
compile/opcode_ids_gen.go | ~600 | compile/opcodes_gen.go |
compile/opcode_metadata_gen.go | ~1500 | compile/opcode_caches.go |
optimizer/optimizer_cases_gen.go | ~2700 | hand-rolled abstract-interp cases in optimizer/analysis.go (currently doc-only; replaces them once dispatch-by-uop lands) |
specialize/cache_layouts_gen.go | ~400 | implicit layout knowledge across specialize/*.go |
specialize/family_gen.go | ~200 | specialize/quicken.go family table + specialize/deopt.go map |
vm/eval_dispatch_gen.go | ~4000 | core of vm/eval_simple.go + vm/eval_specialized*.go |
vm/eval_uops_gen.go | ~3000 | core of vm/uops/*.go |
Phase index
Each phase ports one block end to end. Status lives on the Checklist at the bottom of this spec, mirrored per row here. The phase order is chosen so that every phase ends with a green CI, including phases where the generator is partially wired: each emitter ships with a parity test that diffs its output against the hand-rolled file it will eventually replace, and only flips the hand-rolled file's role to "fallback" when the parity test goes green.
| Phase | Block | Gate | Status |
|---|---|---|---|
| 0 | Vendor Tools/cases_generator/ verbatim. Mirror inputs (bytecodes.c, optimizer_bytecodes.c, pycore_code.h) under Tools/cases_generator/inputs/ at the 3.14.5 hash. Add Tools/regen-cases/ (Go driver) that invokes the generators against our vendored inputs and reproduces CPython's own outputs into a scratch dir for diffing. | upstream reproducibility: regenerating CPython's 9 generator outputs from our vendored inputs matches the files in $HOME/cpython-314 byte for byte (header lines excluded) | DONE (CI pending) |
| 1 | Output abstraction. Port cwriter.py into a gowriter.py sibling that shares the indent/scope API but emits Go syntax. Implement go_generators_common.py with macro→Go bindings for the constant macros (PyStackRef_FromPyObject*, PyStackRef_AsPyObject*, STACK_SHRINK, STACK_GROW, PEEK, POKE, etc). | unit-test corpus: 30 hand-written macro snippets emit known-good Go | DONE (corpus at 20/30; remaining 10 stage with Phase 5 op signatures) |
| 2 | Metadata + opcode-id emitters. Ship gopy_opcode_id_generator.py and gopy_opcode_metadata_generator.py. Output compile/opcode_ids_gen.go + compile/opcode_metadata_gen.go. Parity test: the generated tables equal compile/opcodes_gen.go + compile/opcode_caches.go for every opcode currently in gopy. Once green, the hand-rolled files get deleted. | go test ./compile -run TestOpcodeMetadataParity green; deletion lands | DONE (2.1-2.3; 2.4 deletion pending) |
| 3 | Cache-layout emitter. New cache_struct_parser.py reads pycore_code.h, emits specialize/cache_layouts_gen.go with typed accessors ((*LoadGlobalCache).Index, .ModuleKeysVersion, etc) backed by the codeunit slice. Migrate every SetCacheCell / CacheCell call site in specialize/*.go and vm/eval_specialized_*.go to typed accessors. The LOAD_GLOBAL cell-1 vs cell-4 bug class becomes a compile error. | every specialize/* + eval_specialized_* file builds; existing tests green; one new test (TestCacheLayoutTypedAccess) asserts the struct sizes match _PyOpcode_Caches[] | TODO |
| 4 | Family + deopt emitter. Generator emits specialize/family_gen.go carrying the family table (map[Opcode][]Opcode), deopt map (Opcode→Opcode parent), and per-family cache-size guard. Replace specialize/quicken.go family literal + specialize/deopt.go map. Parity test: generated tables equal current hand-rolled ones for every opcode we specialize today. | parity test green; deletion lands | DONE (8abf069) |
| 5 | Tier-1 harness emitter. Generator emits vm/eval_dispatch_gen.go: the switch statement, per-opcode prologue (stack peek to typed locals), epilogue (stack push of typed outputs, cache advance), deopt path, error path. The body itself remains in hand-written Go: each opcode has a function op<NAME>(e *evalState, oparg uint32, in <inputs>) (out <outputs>, err error) whose signature is derived from the DSL and enforced by the generator. Any hand-written function whose signature diverges from the DSL stack effect is a build error. | every opcode in gopy currently routed through vm/eval_simple.go is now routed through vm/eval_dispatch_gen.go; full go test ./... green | TODO |
| 6 | Specialized arms. Same harness as Phase 5 but generates the specialized cases (LOAD_GLOBAL_MODULE etc), wires DEOPT_IF to a generated return deoptTo<PARENT>(e) shim. Hand-written specializer fast paths in vm/eval_specialized_*.go are reduced to per-opcode body functions; the cache decode, deopt branch, and cache advance live in the generator output. | every vm/eval_specialized_*.go file shrinks to opcode bodies only; the LOAD_GLOBAL bug's regression test (a fixture that stresses cache cell boundaries) is added | TODO |
| 7 | Tier-2 uop harness. gopy_tier2_generator.py emits vm/eval_uops_gen.go. The 14 uops 1712 hand-ported are re-rooted onto generated harness; remaining ~270 uops become trivial Go-body inserts. Parity test: for every uop name shared with tier-1 (e.g. _LOAD_FAST), the tier-1 body and the tier-2 uop body call into the same opcode-body function. | tier-2 trace executor runs the same 1712 P2 microbenchmark with hand-rolled and generated uops side by side; perf within 2%, results identical | TODO |
| 8 | Body translation pilot. Pick 10 trivial opcodes (NOP, POP_TOP, POP_TOP_LOAD_CONST_INLINE_BORROW, LOAD_FAST, LOAD_FAST_BORROW, STORE_FAST, LOAD_CONST, RETURN_VALUE, RESUME_CHECK, END_FOR). Add a C-body→Go-body translator under tools/cases_generator/body_translator.py that handles the constrained subset of C these bodies use. Generated Go bodies replace the hand-written ones; CI must stay green. | 10 opcodes have zero hand-written Go; CI green; size of the body-translator subset documented in tools/cases_generator/SUBSET.md | TODO |
| 9 | Body translation scale-up. Translate every remaining opcode body. Each opcode that survives translation has its hand-written Go body deleted. Opcodes that the translator cannot handle (calls into runtime helpers gopy spells differently, refcount idioms gopy doesn't have because Go is GC'd) stay hand-written but with a generator-emitted stub asserting the signature. | hand-written opcode bodies count drops below 30 (escape hatches only, documented per opcode); reproducibility gate green | TODO |
| Gate | tools/regen-cases.sh && git diff --exit-code runs in CI. Any drift between source DSL and emitted Go fails the build. | gate green | TODO |
Phase 0 — vendor the generator
Bring Tools/cases_generator/ into the gopy repo under
tools/cases_generator/ verbatim. The directory layout mirrors
CPython's: lexer.py, plexer.py, parser.py, parsing.py,
analyzer.py, stack.py, cwriter.py, generators_common.py,
and the eight per-target emitters. No edits. The vendored copy
carries a header comment naming the CPython commit it was pulled
from, identical to how stdlib/ files name their source.
Inputs: Python/bytecodes.c, Python/optimizer_bytecodes.c,
Include/internal/pycore_code.h get mirrored under
tools/cases_generator/inputs/. Same pin: CPython 3.14.5. Spec
1707 (CPython 3.14.x sync) is the upstream rollup; this spec adds
three rows to 1707's checklist for these inputs.
Driver: tools/regen-cases.sh is a thin bash wrapper that exports
PYTHONPATH=tools/cases_generator and invokes each generator with
the right input files and output paths. In Phase 0 the script
emits only into a scratch directory under /tmp and runs diff
against CPython's actual output in $HOME/cpython-314/Python/ and
$HOME/cpython-314/Include/internal/. A green diff proves the
toolchain is wired correctly before any Go work starts.
The Phase 0 gate runs in CI: tools/regen-cases.sh --check-upstream
regenerates the C files into /tmp, diffs against the vendored
CPython tree, fails on any divergence. This catches accidental
edits to the vendored generator and the case where a CPython 3.14
patch release moves the inputs without touching the generator.
| Step | Status | Commit |
|---|---|---|
Vendor Tools/cases_generator/ under Tools/cases_generator/ | DONE | - |
Mirror bytecodes.c, optimizer_bytecodes.c, pycore_code.h under Tools/cases_generator/inputs/ | DONE | - |
Tools/regen-cases/ (Go driver) invokes each upstream generator into a scratch dir | DONE | - |
go run ./Tools/regen-cases --check-upstream diff-clean against CPython 3.14.5 generated files | DONE | - |
CI job cases-generator-upstream-parity green | TODO | - |
Phase 1 — output abstraction
CPython's cwriter.py is a 179-line indent-tracking C-syntax sink.
It exposes emit(text), start_line(), block() (context manager
for {...}), set_position(), set_lineno(). Generators write to
a CWriter and get well-formatted C with #line directives.
Phase 1 introduces tools/cases_generator/gowriter.py with the
same surface area, emitting Go. The block() context emits { /
} the same way; start_line() honors gofmt-friendly
indentation; set_lineno() emits //line directives keyed to the
DSL source location so that runtime panics in generated code point
back to Python/bytecodes.c.
A second file, go_generators_common.py, mirrors
generators_common.py but binds DSL macros to Go expressions. The
constant macros (the ones whose expansion does not depend on the
surrounding stack effect) are the Phase 1 deliverable:
| DSL macro | Go expansion |
|---|---|
PyStackRef_AsPyObjectBorrow(r) | r.AsObject() |
PyStackRef_FromPyObjectNew(o) | stackref.FromObject(o) |
PyStackRef_FromPyObjectImmortal(o) | stackref.FromObjectImmortal(o) |
PyStackRef_IsNull(r) | r.IsNull() |
STACK_SHRINK(n) | (handled by the harness, not the body) |
STACK_GROW(n) | (handled by the harness) |
PEEK(i) | (handled by the harness) |
POKE(i, v) | (handled by the harness) |
Py_INCREF(o) / Py_DECREF(o) | (no-op; Go is GC'd) |
Py_XDECREF(o) | (no-op) |
JUMPBY(n) | e.jumpBy(n) |
next_instr | e.f.NextInstr |
_PyFrame_GetCode(frame) | e.f.Code |
frame->localsplus[i] | e.f.Locals[i] |
tstate | e (the evalState) |
oparg | oparg (passed as parameter) |
DEOPT_IF(cond) | if cond { return 0, false } (specialized arms) |
EXIT_IF(cond) | if cond { return e.tier2Exit() } (tier-2 only) |
ERROR_IF(cond, label) | if cond { return e.<label>() } |
DECREF_INPUTS() | (no-op; harness clears stack refs) |
DEAD(name) | (no-op; informational) |
The Phase 1 corpus is 30 hand-written macro snippets covering each
binding above. The test is mechanical: gowriter.py emits the
binding, the result compiles, the result matches a checked-in
golden file. No real opcode bodies translated yet.
| Step | Status | Commit |
|---|---|---|
Tools/cases_generator/gowriter.py lands | DONE | - |
Tools/cases_generator/go_generators_common.py lands with the binding table above | DONE | - |
30-snippet golden-file corpus under Tools/cases_generator/testdata/snippets/ | PARTIAL (20/30) | - |
go test ./Tools/regen-cases -run TestSnippetParity (Go harness shelling out to Python) green | DONE | - |
Phase 2 — opcode IDs + metadata
Two Go emitters land. gopy_opcode_id_generator.py walks the
analyzer's opcode list and emits compile/opcode_ids_gen.go:
// Code generated by tools/cases_generator. DO NOT EDIT.
// Source: Python/bytecodes.c (CPython 3.14.5)
package compile
const (
NOP Opcode = 0
RESERVED Opcode = 17
LOAD_FAST Opcode = 85
LOAD_GLOBAL Opcode = 91
LOAD_GLOBAL_MODULE Opcode = 158
LOAD_GLOBAL_BUILTIN Opcode = 159
// ...
)
var OpcodeName = map[Opcode]string{ ... }
gopy_opcode_metadata_generator.py emits
compile/opcode_metadata_gen.go:
package compile
// CacheSize is the number of codeunits the inline cache occupies
// for this opcode. Generated from family() declarations and the
// macro() cache-cell list in Python/bytecodes.c.
var CacheSize = map[Opcode]int{
LOAD_GLOBAL: 4,
LOAD_GLOBAL_MODULE: 4,
LOAD_GLOBAL_BUILTIN: 4,
LOAD_ATTR: 9,
// ...
}
// StackEffect carries the popped/pushed counts derived from the
// DSL stack effect declaration. Both counts may depend on oparg
// for variadic opcodes (BUILD_TUPLE, CALL, etc); in that case
// StackEffect.PoppedFn / PushedFn is non-nil.
var StackEffect = map[Opcode]Effect{ ... }
// Family lists the specialized arms of each adaptive opcode.
var Family = map[Opcode][]Opcode{
LOAD_GLOBAL: {LOAD_GLOBAL_MODULE, LOAD_GLOBAL_BUILTIN},
// ...
}
// Deopt is the inverse: every specialized arm maps to its family
// parent.
var Deopt = map[Opcode]Opcode{
LOAD_GLOBAL_MODULE: LOAD_GLOBAL,
LOAD_GLOBAL_BUILTIN: LOAD_GLOBAL,
// ...
}
The parity test (compile/opcode_metadata_parity_test.go) asserts
that the generated maps cover every key currently in
compile/opcode_caches.go with the same value. When green, the
hand-rolled file gets deleted in the same commit; the generated
file becomes the only source.
| Step | Status | Commit |
|---|---|---|
tools/cases_generator/gopy_opcode_id_generator.py | DONE | 78e434b |
tools/cases_generator/gopy_opcode_metadata_generator.py | DONE | this commit |
compile/opcode_ids_gen.go + compile/opcode_metadata_gen.go checked in | DONE | this commit |
| Parity test green | DONE (3 tests; YIELD_VALUE escapes-flag delta logged, generator wins) | this commit |
compile/opcode_caches.go deleted; references redirected | TODO | - |
Phase 3 — typed cache layouts
The biggest single bug-class reduction in this spec. Today,
specialize.SetCacheCell(code, instr, 1, idx) writes a magic
number to a magic offset. The relationship between cell number
and the meaning of that cell lives in a comment, sometimes wrong.
Phase 3 ships tools/cases_generator/cache_struct_parser.py. It
reads Include/internal/pycore_code.h, parses each
_Py<Op>Cache struct, emits specialize/cache_layouts_gen.go:
// Code generated by tools/cases_generator. DO NOT EDIT.
// Source: Include/internal/pycore_code.h (CPython 3.14.5)
package specialize
import "github.com/tamnd/gopy/compile"
// LoadGlobalCache mirrors Include/internal/pycore_code.h:117
// struct _PyLoadGlobalCache. Each codeunit cell maps to one
// uint16 field. Field offsets are checked at init time against
// compile.CacheSize[compile.LOAD_GLOBAL].
type LoadGlobalCache struct {
code []byte
instr int
}
func LoadGlobalCacheAt(code []byte, instr int) LoadGlobalCache { ... }
func (c LoadGlobalCache) Counter() uint16 { ... }
func (c LoadGlobalCache) SetCounter(v uint16) { ... }
func (c LoadGlobalCache) Index() uint16 { ... }
func (c LoadGlobalCache) SetIndex(v uint16) { ... }
func (c LoadGlobalCache) ModuleKeysVersion() uint16 { ... }
func (c LoadGlobalCache) SetModuleKeysVersion(v uint16) { ... }
func (c LoadGlobalCache) BuiltinKeysVersion() uint16 { ... }
func (c LoadGlobalCache) SetBuiltinKeysVersion(v uint16) { ... }
Every call site in specialize/*.go and
vm/eval_specialized_*.go migrates to typed access. The
specializer writes cache.SetIndex(uint16(idx)). The VM reads
cache.Index(). Cell-4 vs cell-1 cannot happen: there is no cell
4, the struct has four fields and the type system enforces the
mapping.
A new test (specialize/cache_layout_size_test.go) asserts that
the codeunit size implied by each typed struct matches
compile.CacheSize[op] for the family parent of that struct. A
struct that overflows the family's reserved size is a test
failure.
| Step | Status | Commit |
|---|---|---|
tools/cases_generator/cache_struct_parser.py | DONE | this commit |
specialize/cache_layouts_gen.go lands with every _Py<Op>Cache typed | DONE | this commit |
Migrate specialize/load_global.go + vm/eval_specialized_load_global.go to typed access | TODO | - |
Migrate every other specialize/*.go + vm/eval_specialized_*.go | TODO | - |
specialize/cache.go SetCacheCell / CacheCell deleted (no callers) | TODO | - |
| Cache-layout size test green | DONE | this commit |
Phase 4 — family + deopt
specialize/family_gen.go carries:
// Code generated by tools/cases_generator. DO NOT EDIT.
// Source: family() declarations in Python/bytecodes.c.
package specialize
import "github.com/tamnd/gopy/compile"
var Family = map[compile.Opcode][]compile.Opcode{
compile.LOAD_GLOBAL: {
compile.LOAD_GLOBAL_MODULE,
compile.LOAD_GLOBAL_BUILTIN,
},
// ... 16 more
}
var DeoptParent = map[compile.Opcode]compile.Opcode{
compile.LOAD_GLOBAL_MODULE: compile.LOAD_GLOBAL,
compile.LOAD_GLOBAL_BUILTIN: compile.LOAD_GLOBAL,
// ...
}
Existing hand-rolled equivalents in specialize/quicken.go and
specialize/deopt.go shrink to consumers of these tables. The
parity test asserts that for every opcode in either side, the
hand-rolled and generated versions agree. Once green, the literal
tables are deleted; the consumers stay.
| Step | Status | Commit |
|---|---|---|
gopy_family_generator.py (or fold into Phase 2's metadata emitter) | DONE | 8abf069 |
specialize/family_gen.go lands | DONE | 8abf069 |
specialize/quicken.go + specialize/deopt.go consume generated tables | DONE | 8abf069 |
| Parity test green; old literals deleted | DONE | 8abf069 |
Phase 5 — tier-1 dispatch harness
The harness owns:
- The dispatch
switchkeyed on opcode. - Per-opcode prologue: peek stack inputs into typed locals named per the DSL declaration.
- Per-opcode epilogue: push outputs, advance
next_instrby1 + CacheSize[op]. - Error path: every
ERROR_IF(cond, label)becomes a generated helper call. - Deopt path (for specialized arms): every
DEOPT_IF(cond)becomes a return that signals fallback to the unspecialized arm.
The body is not generated in this phase. Each opcode has a hand-written Go function whose signature is fully determined by the DSL:
// Generator-derived signature:
// inst(LOAD_FAST, (-- value))
// produces:
func opLOAD_FAST(e *evalState, oparg uint32) (value stackref.Ref, err error) {
return e.f.Locals[oparg], nil
}
// inst(BINARY_OP, (lhs, rhs -- res))
// produces:
func opBINARY_OP(e *evalState, oparg uint32, lhs, rhs stackref.Ref) (res stackref.Ref, err error) {
...
}
The generator emits a stub per opcode at the bottom of
vm/eval_dispatch_gen.go:
//go:linkname op<NAME> github.com/tamnd/gopy/vm.op<NAME>
var _ = op<NAME> // build error if op<NAME> is missing or signature drifts
A missing body or a wrong signature is a build error, not a runtime error. This is the entire point of the spec: the same class of bug that took down 1713 P2 work for half a day cannot exist by the end of Phase 5.
The harness also drives the cache: a specialized arm gets
cache := LoadGlobalCacheAt(code, instr) injected before the
body, so the body never reaches for raw codeunits.
| Step | Status | Commit |
|---|---|---|
gopy_tier1_generator.py lands (as Tools/bytecodes_gen Go emitter) | DONE | (prior phase) |
vm/eval_dispatch_gen.go covers every unspecialized opcode (skeleton; bodies pending Phase 8) | PARTIAL (107 arms, bodies stubbed) | this commit |
vm/eval_simple.go shrinks to evalLoop scaffolding only (frame setup, exit handling) | PARTIAL (NOP, POP_TOP routed) | this commit |
go test ./vm green | DONE | this commit |
CPython-parity harness (Tools/bytecodes_gen/cpython_parity_test.go) lifts Lib/test/test_generated_cases.py fixtures and prints rolling coverage | DONE (5 / 10 fixtures translate today) | f97a926 |
Bytecodes.c coverage gauge (Tools/bytecodes_gen/cpython_coverage_test.go) walks every inst() in CPython 3.14.5's Python/bytecodes.c and reports the bail histogram | DONE (14 / 118 inst() bodies translate today) | a93336d |
CPython-parity gate
Tools/bytecodes_gen/cpython_parity_test.go is the spec's authority
on action-translator faithfulness. Each fixture is a verbatim copy
of an (input, output) pair from
Lib/test/test_generated_cases.py, wrapped in BEGIN/END markers and
fed through ParseBytecodes → AnalyzeInst → TranslateBody. Each
row carries:
bail=truewhile the translator falls back to a panic-stub. The harness asserts the fallback note's prefix stays stable so a drift in error wording shows up as a test failure rather than a silent regression.want=[...substr...]once the translator handles the shape. The harness asserts the rendered Go body contains every substring.
TestCPythonParityFixtures logs coverage: PASS / N fixtures translate (bail=B) so the porting auto-flow can read progress
without parsing test output structure. Coverage growth is
monotonic: a fixture never moves from bail=false back to
bail=true. Rows are never removed; when CPython retires a test
we mirror the deletion in a separate commit so blame stays
honest.
Bytecodes.c coverage gauge
Tools/bytecodes_gen/cpython_coverage_test.go is the
complementary, exhaustive gauge: it walks every inst() in CPython
3.14.5's Python/bytecodes.c, runs each through the full pipeline
(ParseBytecodes → AnalyzeInst → TranslateBody), and groups the
bail reasons. The headline number is N / total inst() bodies translate; the bail histogram (bail (count) reason names...)
exposes which translator extension yields the most leverage.
A hard floor (const minTranslates) in the test refuses to let the
count regress. Bump it (never down) when a translator change flips
more bodies; the porting auto-flow reads that constant to know
progress without scraping logs.
Caveat: the gauge tracks parser coverage — a body "translates"
when no stage rejects it. Compile-correctness of the emitted Go is
verified separately by the strict dispatchGenSupported
whitelist, which gates which opcodes route through dispatchGen in
the live eval loop. The two layers measure different things on
purpose: the gauge tells us "how much of bytecodes.c does the
translator parse without bailing", the whitelist tells us "which
opcodes have we audited end-to-end and run in production".
Migration progress
The harness routes every opcode through one of three layers, in
order: generated (dispatchGen, gated by dispatchGenSupported),
hand-written staging (dispatchHandwritten), and the legacy
trySimple panel. The goal is to flip every opcode into the
generated column. Each row migrates exactly once. Update these
tables whenever an opcode moves.
Generated (dispatchGen)
These arms come straight out of Tools/bytecodes_gen and the
action translator. The whitelist in
vm/dispatch_gen_whitelist.go controls which ones the dispatcher
actually consults; an opcode lands here only after its generated
body has been audited byte-equivalent to the prior arm.
Each row carries a Status (generated once it lands in
dispatchGen and the whitelist; staging while still in
dispatchHandwritten; legacy while still in trySimple and
friends) and a Commit stamp so the porting auto-flow can
diff the tables against the tree without re-reading every
panel.
| Opcode | Status | Commit | Translator shape | Notes |
|---|---|---|---|---|
NOP | generated | 71b1fd1 | empty body | trivial |
POP_TOP | generated | 71b1fd1 | PyStackRef_CLOSE(value) | exercises stack-ref close |
JUMP_FORWARD | generated | 0d54073 | JUMPBY(oparg) | body-driven terminator |
PUSH_NULL | generated | 0d47064 | output = PyStackRef_NULL | output-assignment statement |
LOAD_FAST | generated | b842a4a | output = PyStackRef_DUP(GETLOCAL(oparg)) | GETLOCAL rvalue + Dup |
LOAD_FAST_BORROW | generated | b842a4a | same body as LOAD_FAST | borrow collapses under Go GC |
LOAD_FAST_AND_CLEAR | generated | b842a4a | LOAD_FAST plus GETLOCAL(oparg) = PyStackRef_NULL | GETLOCAL lvalue |
STORE_FAST | generated | b842a4a | _PyStackRef tmp = GETLOCAL(oparg); GETLOCAL(oparg) = value; PyStackRef_XCLOSE(tmp) | C-local decl + lvalue |
JUMP_BACKWARD_NO_INTERRUPT | generated | 337d126 | JUMPBY(-oparg) | shares JUMPBY body with JUMP_FORWARD; JUMP_BACKWARD proper stays handwritten for breaker poll |
END_SEND | generated | 8ac6d1c | val = value; DEAD(value); PyStackRef_CLOSE(receiver) | bit-equivalent to handwritten body in eval_simple.go |
LOAD_BUILD_CLASS | generated | 55440dc | int err = PyMapping_GetOptionalItem(BUILTINS(), &_Py_ID(__build_class__), &bc_o) + NameError when absent | first Bucket B flip; lit _PyErr_SetString payload now flows through setPendingErr |
SETUP_ANNOTATIONS | generated | 07aa060 | LOCALS() + PyMapping_GetOptionalItem(LOCALS(), &_Py_ID(__annotations__), &ann_dict) + PyDict_New() fallback + PyObject_SetItem(LOCALS(), &_Py_ID(__annotations__), ann_dict) | second Bucket B flip; PyDict_New registered as expression-side helper; EvalCode now defaults f.Locals = globals for module frames so LOCALS() matches CPython at module scope |
LOAD_FROM_DICT_OR_GLOBALS | generated | 40a53e3 | GETITEM(FRAME_CO_NAMES, oparg) + _PyDict_LoadGlobal cascade | A6 helper-call vocabulary registered the dict-globals lookup |
LOAD_SMALL_INT | generated | a7a4f7f | PyStackRef_FromPyObjectBorrow(_PyLong_GetSmallInt(oparg)) | small-int constant pool |
LOAD_LOCALS | generated | a7a4f7f | LOCALS() lifted into Go via e.frame.Locals() | Bucket B LOCALS() shim |
UNARY_NEGATIVE | generated | e2c5275 | PyNumber_Negative(value) → objects.NumberNegative | Bucket B helper |
UNARY_INVERT | generated | e2c5275 | PyNumber_Invert(value) → objects.NumberInvert | Bucket B helper |
UNARY_NOT | generated | e2c5275 | int err = PyObject_IsTrue(...) → objects.IsTruthy; output is PyStackRef_True/False | shares IsTruthy plumbing with POP_JUMP_IF |
LIST_APPEND | generated | f658e40 | int err = _PyList_AppendTakeRef(list, v) | A1 sized-input flip |
SET_ADD | generated | f658e40 | int err = PySet_Add(set, v) | A1 sized-input flip |
MAP_ADD | generated | f658e40 | int err = _PyDict_SetItem_Take2(dict, key, value) | A1 sized-input flip |
DELETE_SUBSCR | generated | f658e40 | int err = PyObject_DelItem(container, sub) → objects.DelItem | A2 int-local + helper port |
GET_LEN | generated | 44eff3d | Py_ssize_t len_i = PyObject_Length(obj) → objects.Length | A7 C-type table (Py_ssize_t) |
BUILD_STRING | generated | 44eff3d | _PyUnicode_JoinArray over the stackref slice | A8 STACKREFS_TO_PYOBJECTS macro |
FORMAT_SIMPLE | generated | 44eff3d | if (!PyUnicode_CheckExact(value)) ... → objects.Str fallback | A3 if-statement parser |
COPY | generated | 90c4fce | output = PyStackRef_DUP(bottom) over a sized-input region | A1 sized-input flip |
SWAP | generated | 90c4fce | swap top with top[1 - oparg] over a sized-input region | A1 sized-input flip |
SET_UPDATE | generated | ab365fa | int err = _PySet_Update(set, iterable) | A1 sized-input flip |
DICT_UPDATE | generated | ab365fa | int err = PyDict_Update(dict, mapping) | A1 sized-input flip |
LOAD_COMMON_CONSTANT | generated | 4ac50b5 | PyStackRef_FromPyObjectImmortal(tstate->interp->common_consts[oparg]) | A7 C-type table (tstate); also drops dead END_SEND arm |
POP_EXCEPT | generated | ee55ed4 | _PyErr_StackItem swap into tstate->exc_info | A7 C-type table (_PyErr_StackItem); routes through setHandledException |
PUSH_EXC_INFO | generated | ee55ed4 | mirrors POP_EXCEPT in the opposite direction | A7 C-type table |
STORE_GLOBAL | generated | d6ad44b | GETITEM(FRAME_CO_NAMES, oparg) + PyDict_SetItem(GLOBALS(), name, v) | A4 GETITEM helper |
DELETE_GLOBAL | generated | d6ad44b | GETITEM(FRAME_CO_NAMES, oparg) + PyDict_DelItem(GLOBALS(), name) | A4 GETITEM helper |
FORMAT_WITH_SPEC | generated | 67735f0 | PyObject_Format(value, format_spec) → objects.Format | Bucket B helper |
GET_ITER | generated | 02e72c3 | PyObject_GetIter(iterable) → e.objectGetIter | Bucket B helper; routes errors via e.pendingErr |
BUILD_LIST | generated | b0819a5 | _PyList_FromStackRefStealOnSuccess(values, oparg) → e.listFromStackRef | sized-input peek + drop; bottom-first order matches handwritten pop-in-reverse |
BUILD_TUPLE | generated | b0819a5 | _PyTuple_FromStackRefStealOnSuccess(values, oparg) → e.tupleFromStackRef | sized-input peek + drop |
BUILD_SLICE | generated | b0819a5 | PySlice_New(start, stop, step) → e.sliceNew over args[0..2] | step is nil when oparg==2 |
BUILD_MAP | generated | 60c7912 | _PyDict_FromItems(values_o, 2, values_o+1, 2, oparg) → e.dictFromItems | bottom-first key/value pairs match handwritten reverse-pop order |
BUILD_TEMPLATE | generated | 60c7912 | _PyTemplate_Build(strings, interpolations) → e.templateBuild | t-string runtime; helper is a thin objects.NewTemplateStr wrapper |
GET_AWAITABLE | generated | 419072c | _PyEval_GetAwaitable(iter, opcode) → e.getAwaitable | helper already wired; flipped after auditing handwritten arm against generated body |
GET_ANEXT | generated | 419072c | _PyEval_GetANext(aiter) → e.getANext | async-gen __anext__ wrapper; flipped after audit |
MAKE_CELL | generated | a8d7afd | PyCell_New(PyStackRef_AsPyObjectBorrow(GETLOCAL(oparg))) → e.cellNew(e.localAt(int(oparg)).AsObject()) + setLocal + tmp.Close() | Bucket B PyCell_New → objects.NewCell helper flip. AsObject() returns nil for a null stackref, which matches the borrow-of-NULL semantics CPython documents for unset cell slots. Close() is a no-op under Go GC, matching PyStackRef_XCLOSE for the discarded prior ref. Handwritten arm in vm/eval_simple.go deleted in the same commit. |