6. From Source Code to Execution
End-to-end journey of a .py file through lexing, parsing, compilation, and bytecode evaluation.
6. From Source Code to Execution
CPython does not execute Python source text directly. It transforms source text through several internal representations before the first bytecode instruction runs.
The path is:
source text
↓
tokens
↓
parse tree
↓
abstract syntax tree
↓
symbol table
↓
code object
↓
frame
↓
bytecode evaluation
↓
object operations
Each stage has a separate job. The tokenizer understands characters. The parser understands syntax. The AST represents program structure. The symbol table classifies names. The compiler emits bytecode. The evaluator runs bytecode against Python objects.
6.1 Source Text
The input begins as text.
x = 1 + 2
print(x)
Before CPython can execute this, it must know:
where statements begin and end
which characters form names
which characters form numbers
which indentation levels define blocks
which tokens form expressions
which names are local or global
which bytecode instructions are needed
Python source code is not just a string. It has encoding, line structure, indentation, comments, string literal rules, and syntax rules.
The first stage converts raw text into tokens.
6.2 Tokenization
The tokenizer reads source characters and produces tokens.
For this code:
x = 1 + 2
the tokenizer produces a stream similar to:
NAME("x")
EQUAL("=")
NUMBER("1")
PLUS("+")
NUMBER("2")
NEWLINE
ENDMARKER
For block structure, indentation becomes tokens too.
if ok:
run()
else:
stop()
Conceptually:
NAME("if")
NAME("ok")
COLON
NEWLINE
INDENT
NAME("run")
LPAR
RPAR
NEWLINE
DEDENT
NAME("else")
COLON
NEWLINE
INDENT
NAME("stop")
LPAR
RPAR
NEWLINE
DEDENT
ENDMARKER
This is important. Python block structure is not inferred later from whitespace. The tokenizer emits explicit INDENT and DEDENT tokens.
6.3 Parsing
The parser consumes tokens and checks whether they form valid Python syntax.
For:
x = 1 + 2
the parser recognizes an assignment statement whose right side is a binary expression.
For:
def add(a, b):
return a + b
the parser recognizes:
function definition
function name
parameter list
function body
return statement
binary expression
The parser rejects invalid token sequences:
x = + * 3
This reaches the parser, but it cannot be reduced into valid syntax.
The parser’s output is a structured representation of the program. CPython then converts that structure into an AST.
6.4 Abstract Syntax Tree
The AST represents the semantic structure of the program.
For:
x = 1 + 2
the AST is conceptually:
Module
Assign
target: Name("x", Store)
value:
BinOp
left: Constant(1)
op: Add
right: Constant(2)
The AST removes many surface details and keeps the structure needed by later compiler stages.
You can inspect the AST from Python:
import ast
tree = ast.parse("x = 1 + 2")
print(ast.dump(tree, indent=4))
Example output shape:
Module(
body=[
Assign(
targets=[
Name(id='x', ctx=Store())],
value=BinOp(
left=Constant(value=1),
op=Add(),
right=Constant(value=2)))],
type_ignores=[])
The AST says what the program means structurally. It does not yet say which bytecode instructions to emit.
6.5 Name Contexts
AST names carry context.
In this code:
x = x + 1
the two uses of x have different roles.
left side x Store
right side x Load
Conceptually:
Assign
target: Name("x", Store)
value:
BinOp
left: Name("x", Load)
op: Add
right: Constant(1)
This distinction matters because loading a name and storing a name compile to different operations.
Load context read a value
Store context assign a value
Del context delete a binding
The compiler relies on this information when emitting bytecode.
6.6 Symbol Table Analysis
Before bytecode generation, CPython analyzes names.
It decides whether each name is:
local
global
nonlocal
free
cell
implicit builtin lookup
Example:
x = 10
def f(y):
return x + y
Inside f:
y is local
x is global or builtin lookup
Another example:
def outer():
x = 10
def inner():
return x
return inner
Here:
x is local in outer
x is free in inner
x becomes a cell variable in outer
A cell variable is a local variable that must survive because an inner function captures it. A free variable is a variable used by a function but stored in an enclosing scope.
This stage is essential for closures.
6.7 Code Objects
After parsing and symbol analysis, CPython compiles code into code objects.
A code object contains:
bytecode
constants
names
local variable names
free variable names
cell variable names
stack size
flags
filename
function name
line mapping information
exception table
You can inspect a code object:
def add(a, b):
return a + b
code = add.__code__
print(code.co_name)
print(code.co_varnames)
print(code.co_consts)
print(code.co_names)
print(code.co_freevars)
print(code.co_cellvars)
The code object is immutable. It describes executable code, but it does not contain the current runtime values of local variables.
6.8 Bytecode
Bytecode is CPython’s instruction format.
For:
def add(a, b):
return a + b
disassembly may look conceptually like:
LOAD_FAST a
LOAD_FAST b
BINARY_OP +
RETURN_VALUE
Actual bytecode names and layout vary by Python version. The core idea remains: bytecode instructions operate on a frame.
Use dis:
import dis
def add(a, b):
return a + b
dis.dis(add)
Bytecode is lower-level than the AST. It is close to execution.
The AST says:
return a + b
The bytecode says:
load a
load b
perform addition
return result
6.9 Constants and Names
A code object stores constants and names separately from bytecode.
For:
x = 10
print(x)
the constant 10 is stored in the constants table. The names x and print are stored in the names table.
Conceptually:
co_consts = (10, None)
co_names = ("x", "print")
The bytecode then references these tables by index.
LOAD_CONST 0 load constant 10
STORE_NAME 0 store into name x
LOAD_NAME 1 load name print
LOAD_NAME 0 load name x
CALL 1
POP_TOP
LOAD_CONST 1 load None
RETURN_VALUE
This makes bytecode compact. Instructions store small indexes instead of full strings or objects.
6.10 Module Execution
A Python file is compiled into a module-level code object.
For:
# demo.py
x = 1
def f():
return x
CPython compiles the whole file into one code object. Executing that code object creates module bindings.
Conceptually:
create module object
create module dictionary
execute module code object in that dictionary
bind x
create function object f
bind f
The function body is also compiled into its own code object. The module code object contains that function code object as a constant.
This explains why defining a function executes code at module import time: the body does not run, but the function object is created and bound.
6.11 Function Definition
A function definition is executable code.
For:
def add(a, b):
return a + b
CPython does not run the body immediately. It creates a function object.
Conceptually:
load code object for add
load function name
create function object
store function object in current namespace
The function object contains:
code object
globals dictionary
default values
closure cells
annotations
metadata
Later, when the function is called, CPython creates a frame from that function object and executes the function’s code object.
6.12 Frame Creation
A frame is created when CPython executes a code object.
For a function call:
add(2, 3)
CPython creates a frame with:
code object for add
globals from add.__globals__
builtins
local slots
argument values
value stack
instruction pointer
exception state
The arguments are placed into local variable slots.
a = 2
b = 3
Then the bytecode evaluator starts executing the frame.
6.13 Evaluation Stack
Most bytecode instructions communicate through the frame’s value stack.
For:
return a + b
the execution is:
LOAD_FAST a push value of a
LOAD_FAST b push value of b
BINARY_OP + pop two values, add, push result
RETURN_VALUE pop result and return it
The local variables are stored separately from temporary stack values.
locals:
a = 2
b = 3
stack:
temporary values used by bytecode
This is why CPython is called a stack-based virtual machine.
6.14 Object Operations
Bytecode instructions operate on Python objects, not raw C primitives.
When CPython executes:
a + b
it does not assume that a and b are machine integers.
They may be:
integers
floats
strings
lists
tuples
NumPy arrays
user-defined objects
The operation dispatches through the object protocol.
For int + int, CPython uses integer addition. For str + str, it uses string concatenation. For user-defined classes, it may call __add__.
Conceptually:
BINARY_OP +
inspect operand types
find numeric operation
call appropriate slot
return Python object
This is why bytecode remains generic while types provide concrete behavior.
6.15 Attribute Access
For:
obj.name
CPython compiles an attribute load.
Conceptually:
LOAD_FAST obj
LOAD_ATTR name
At runtime, LOAD_ATTR performs Python attribute lookup rules:
check type descriptors
check instance dictionary
check non-data descriptors and class attributes
possibly call __getattr__
raise AttributeError if missing
Attribute access is not a raw field lookup in the general case. It is a protocol operation.
This explains why attribute access can run Python code.
class C:
@property
def name(self):
print("computed")
return 42
obj = C()
obj.name
The attribute read calls descriptor code.
6.16 Calls
For:
result = f(2, 3)
CPython evaluates the callable and arguments, then performs a call.
Conceptually:
load f
load 2
load 3
call with 2 positional arguments
store result
At runtime, the callable may be:
Python function
built-in C function
bound method
class object
object with __call__
partial object
method descriptor
A Python function call creates a new frame. A C built-in call invokes a C function wrapper. A class call allocates and initializes an instance.
The bytecode instruction is generic. Runtime dispatch decides the exact call path.
6.17 Control Flow
Control flow is compiled into jumps.
For:
if x:
a()
else:
b()
CPython emits bytecode shaped like:
load x
jump if false to else
call a
jump to end
else:
call b
end:
For loops compile into iterator protocol operations plus jumps.
for x in items:
use(x)
Conceptually:
get iterator
loop_start:
get next item
if exhausted, jump to loop_end
store x
call use(x)
jump to loop_start
loop_end:
The language feature is high-level. The execution model is bytecode jumps and protocol calls.
6.18 Exception Handling
Exception handling compiles into protected bytecode ranges and handler metadata.
For:
try:
risky()
except ValueError:
recover()
CPython needs to know:
which bytecode range is protected
where the handler starts
which exception type to match
how to unwind the stack
where execution continues
When an exception occurs, the evaluator consults exception handling metadata and transfers control to the appropriate handler if one matches.
If no handler matches in the current frame, the exception propagates to the caller.
6.19 Imports
An import statement is executable code.
import math
At runtime, CPython uses the import machinery to:
check sys.modules
find a module spec
load or create the module
execute module code if needed
bind the name
The import system is partly implemented in Python through importlib and partly supported by C runtime code.
A module file is compiled and executed just like other Python code, but its execution namespace is the module dictionary.
6.20 Comprehensions
A comprehension has its own execution scope.
For:
squares = [x * x for x in range(10)]
CPython creates code for the comprehension body.
Conceptually:
call range(10)
get iterator
create result list
run comprehension code
append each computed value
store final list in squares
The loop variable x belongs to the comprehension’s internal scope, not the surrounding function scope.
This is why:
[x for x in range(3)]
print(x)
does not bind x in the surrounding scope in modern Python.
6.21 Closures
Closures require cells.
For:
def outer():
x = 10
def inner():
return x
return inner
inner uses a variable from outer.
CPython cannot store x as an ordinary fast local that disappears when outer returns. It stores x in a cell object.
Conceptually:
outer local x becomes cell variable
inner references x as free variable
inner function stores reference to the cell
cell keeps x alive after outer returns
This is why the returned function still works:
f = outer()
print(f())
The value survives through the closure cell.
6.22 Generators
A generator function compiles differently from an ordinary function.
def count():
yield 1
yield 2
Calling count() creates a generator object. It does not immediately run the body.
The generator object stores suspended execution state:
code object
frame or frame-like execution state
instruction offset
local variables
value stack state
running status
Each next() resumes execution until the next yield.
g = count()
next(g)
next(g)
A yield is not just a return. It suspends the frame and preserves execution state.
6.23 Coroutines
A coroutine is similar to a generator, but it participates in the await protocol.
async def fetch():
value = await operation()
return value
Calling fetch() creates a coroutine object. The body runs only when the coroutine is awaited or scheduled.
The coroutine stores suspended execution state and resumes around await points.
Conceptually:
create coroutine object
start execution
reach await
suspend coroutine
resume later with result
continue execution
return final value
The event loop is outside the core bytecode model, but coroutine suspension and resumption are runtime features implemented by CPython objects and frames.
6.24 Class Definition
A class statement is executable code.
class C:
x = 1
def f(self):
return self.x
CPython does not simply allocate a static type. It executes the class body in a temporary namespace.
Conceptually:
load class name
prepare class namespace
execute class body code object
collect attributes and methods
call metaclass
bind resulting class object to name C
This explains why class bodies can contain arbitrary code:
class C:
print("building class")
x = 1 + 2
The class body executes immediately when the class statement runs.
6.25 End-to-End Example
Consider:
x = 10
def add(y):
return x + y
print(add(5))
The pipeline is:
tokenize source
parse tokens
build AST
analyze symbols
compile module code object
execute module frame
bind x = 10
create function object add
bind add
load print
load add
load 5
call add
create function frame
bind y = 5
load global x
load local y
add objects
return 15
call print
finish module execution
The important point is that CPython has already done substantial work before the first line appears to run.
6.26 Where Each Stage Lives
A useful source map:
Tokenizer Parser/
Parser Parser/ and Grammar/
AST support Python/ast.c
Symbol table Python/symtable.c
Compiler Python/compile.c
Code objects Objects/codeobject.c
Function objects Objects/funcobject.c
Frames Python/ and Objects/frameobject.c
Evaluation loop Python/ceval.c and generated/interpreter files
Objects Objects/
Imports Lib/importlib/ and Python/import.c
Exact filenames shift over time, but this map is stable enough for reading the repository.
6.27 Mental Model
Keep this compact model:
Source code becomes tokens.
Tokens become syntax.
Syntax becomes AST.
AST plus scope analysis becomes bytecode.
Bytecode lives in code objects.
Code objects execute inside frames.
Frames use local slots and a value stack.
Bytecode instructions operate on PyObject references.
Types decide concrete behavior.
The full system is large, but this sequence is the backbone.
6.28 Chapter Summary
CPython execution is a pipeline. It starts with source text and ends with object operations inside the bytecode evaluator. The tokenizer handles characters. The parser handles syntax. The AST represents structure. The symbol table classifies names. The compiler emits code objects. Frames execute code objects. Bytecode manipulates Python objects through type-defined behavior.
This pipeline explains how high-level Python constructs become concrete runtime actions.