3. Repository Layout

Tour of the CPython source tree: Modules/, Objects/, Python/, Parser/, Include/, and the build system.

3. Repository Layout

The CPython repository is organized around the major subsystems of the interpreter: object implementations, runtime machinery, compiler pipeline, parser, built-in modules, standard library, tests, documentation, and platform build files.

A good first pass is to treat the source tree as a map of responsibilities.

cpython/
    Include/
    Objects/
    Python/
    Parser/
    Modules/
    Lib/
    Programs/
    Tools/
    Doc/
    Grammar/
    PC/
    PCbuild/
    Mac/

Each directory has a different role. Some contain core runtime code. Some contain generated code. Some contain test fixtures. Some exist mainly for platform-specific builds.

3.1 Top-Level Structure

Directory Main role
Include/ Public, internal, and private C headers
Objects/ Implementations of core object types
Python/ Runtime, compiler, interpreter loop, initialization
Parser/ Tokenizer and parser support code
Grammar/ Grammar input files
Modules/ Built-in and extension modules written in C
Lib/ Python standard library
Lib/test/ CPython regression test suite
Programs/ Executable entry points
Tools/ Developer and build tools
Doc/ Documentation source
PC/ Windows-specific source and config files
PCbuild/ Windows build system
Mac/ macOS-specific support

The most important directories for internals reading are:

Include/
Objects/
Python/
Parser/
Modules/
Lib/test/

Those directories cover the object model, execution engine, compiler, parser, built-in types, C API, and tests.

3.2 Include/: C Header Files

Include/ contains the header files used by CPython itself, extension modules, and embedders.

A simplified layout:

Include/
    Python.h
    object.h
    unicodeobject.h
    listobject.h
    dictobject.h
    cpython/
    internal/

The most important file is:

#include <Python.h>

Python.h is the umbrella public header for extension modules. It includes many other public headers and exposes the C API most extension authors use.

Header categories

Header area Audience Stability
Include/*.h Public C API users Relatively stable
Include/cpython/ CPython-specific API Less portable
Include/internal/ CPython internals only Can change freely

This distinction matters. Code inside CPython can include internal headers. Third-party extensions generally should not.

For example:

#include "Python.h"

is normal for extension modules.

But:

#include "internal/pycore_runtime.h"

is for CPython core code. It exposes internal runtime structures that are not part of the stable public API.

3.3 Objects/: Built-In Object Implementations

Objects/ contains the C implementations of many core Python object types.

Examples:

File Implements
object.c Base object operations
typeobject.c Type objects, classes, MRO, slots
longobject.c Python integers
floatobject.c Python floats
unicodeobject.c Python strings
bytesobject.c bytes
bytearrayobject.c bytearray
listobject.c list
tupleobject.c tuple
dictobject.c dict
setobject.c set and frozenset
funcobject.c Function objects
methodobject.c Built-in method objects
moduleobject.c Module objects
genobject.c Generators and coroutines
frameobject.c Frame object support
codeobject.c Code objects
cellobject.c Closure cells
descrobject.c Descriptors

This directory is the best place to study how Python values are represented and operated on.

For example, list behavior lives mainly in:

Objects/listobject.c

Dictionary behavior lives mainly in:

Objects/dictobject.c

String behavior lives mainly in:

Objects/unicodeobject.c

When Python runs:

items = []
items.append(1)

the underlying list allocation, resizing, method lookup, and append operation eventually involve code in Objects/listobject.c and type machinery in Objects/typeobject.c.

3.4 Python/: Runtime, Compiler, and Interpreter Core

Python/ contains much of CPython’s central machinery.

Important files include:

File Role
ceval.c Bytecode evaluation loop
bytecodes.c Bytecode instruction definitions in modern CPython
compile.c AST to code object compiler
symtable.c Symbol table analysis
ast.c AST support
pythonrun.c High-level execution entry points
pylifecycle.c Runtime initialization and finalization
import.c Import support
errors.c Exception state and error APIs
traceback.c Traceback support
sysmodule.c Implementation of sys
bltinmodule.c Built-in functions and builtins module
marshal.c Internal serialization format for code objects
thread.c Thread abstraction layer
context.c Context variable support
bootstrap_hash.c Hash secret initialization

The file names are not merely labels. They reflect deep runtime subsystems.

Interpreter execution

The bytecode interpreter is centered around the evaluation loop. Historically this is associated with ceval.c. In newer CPython versions, opcode definitions and generated interpreter pieces may be split across additional files.

Conceptual role:

frame enters evaluation
    ↓
bytecode instruction fetched
    ↓
instruction dispatch
    ↓
object operation
    ↓
stack and frame state updated

Compiler pipeline

The compiler code lowers AST nodes into code objects.

A simplified path:

source text
    ↓
tokens
    ↓
parse tree
    ↓
AST
    ↓
symbol table
    ↓
compiler
    ↓
code object

The key files are usually:

Parser/
Python/ast.c
Python/symtable.c
Python/compile.c
Objects/codeobject.c

3.5 Parser/: Tokenizer and Parser Support

Parser/ contains the tokenizer, parser generator support, parser implementation files, and generated parser-related code.

Important areas include:

Area Role
tokenizer Converts source text into tokens
parser Builds syntax structures from tokens
PEG machinery Supports Python’s PEG parser
generated parser files Produced from grammar definitions

The parser’s job is to decide whether source text is valid Python syntax and to build the structures that later become an AST.

Example:

x = 1 + 2

The parser needs to recognize:

assignment statement
target name x
expression 1 + 2
integer literal 1
integer literal 2
binary addition operator

Parsing precedes semantic analysis. The parser knows syntax. The symbol table pass later decides whether names are local, global, free, or cell variables.

3.6 Grammar/: Grammar Definitions

Grammar/ contains grammar input files used to generate parser-related code.

The grammar defines Python syntax in a form consumed by CPython’s parser tooling.

For internals work, grammar changes are high impact. A syntax change can affect:

parser generation
AST generation
compiler behavior
error messages
tests
documentation
tools that parse Python

A grammar-level change often requires regeneration and targeted tests.

The usual workflow is:

edit grammar input
regenerate parser files
rebuild CPython
run parser, AST, compiler, and syntax tests

3.7 Modules/: Built-In and Extension Modules

Modules/ contains many C modules shipped with CPython.

Examples:

File or directory Module
_io/ I/O implementation
_decimal/ Decimal accelerator
_sqlite/ SQLite module
_ssl.c SSL support
_hashopenssl.c Hashing with OpenSSL
_ctypes/ ctypes
arraymodule.c array
mathmodule.c math
itertoolsmodule.c itertools
functoolsmodule.c _functools
posixmodule.c os platform operations
timemodule.c time

Some standard library modules are written in Python and use C accelerators from Modules/.

For example, a public Python module may live in Lib/, while a private performance-critical helper lives in Modules/.

This pattern gives CPython a clean public API while preserving fast C implementations for hot paths.

3.8 Lib/: Standard Library

Lib/ contains the Python standard library.

Examples:

Path Role
Lib/os.py OS interface layer
Lib/pathlib/ Object-oriented paths
Lib/importlib/ Import system implementation
Lib/asyncio/ Async I/O framework
Lib/collections/ Collection utilities
Lib/dataclasses.py Dataclass support
Lib/typing.py Typing support
Lib/unittest/ Unit testing framework
Lib/json/ JSON implementation
Lib/concurrent/ Futures and process/thread pools

Many CPython internals are easier to understand by reading the Python-level standard library first.

For example, importlib contains much of the import system in Python code. CPython bootstraps it specially, but large parts remain readable as Python.

3.9 Lib/test/: Regression Tests

Lib/test/ contains CPython’s test suite.

This directory is essential for internals work.

Examples:

Test file Focus
test_dict.py Dictionary behavior
test_list.py List behavior
test_gc.py Garbage collector
test_sys.py sys module
test_dis.py Bytecode disassembly
test_compile.py Compiler behavior
test_ast.py AST behavior
test_importlib/ Import system
test_capi/ C API behavior
test_threading.py Threading behavior

When studying a subsystem, pair the implementation file with its tests.

Implementation Tests
Objects/dictobject.c Lib/test/test_dict.py
Objects/listobject.c Lib/test/test_list.py
Python/compile.c Lib/test/test_compile.py
Python/symtable.c Lib/test/test_symtable.py
Python/sysmodule.c Lib/test/test_sys.py
Modules/mathmodule.c Lib/test/test_math.py

This habit prevents reading code in isolation. CPython behavior is defined by code plus tests plus documentation plus compatibility expectations.

3.10 Programs/: Executable Entry Points

Programs/ contains source files for CPython executable programs.

Typical files include:

Programs/python.c
Programs/_testembed.c

Programs/python.c is the normal command-line interpreter entry point.

A simplified startup path looks like:

main()
    ↓
initialize runtime
    ↓
configure interpreter
    ↓
run command, script, module, stdin, or REPL
    ↓
finalize runtime

This directory is useful when studying startup, embedding, command-line options, and interpreter initialization.

3.11 Tools/: Developer Tools

Tools/ contains helper programs for CPython development.

Examples include tools for:

generated file maintenance
bytecode and opcode metadata
C API inspection
test support
build support
freeze tooling
scripts used by maintainers

The exact contents change over time. The important rule is that many generated files in CPython have source inputs and regeneration tools. Tools/ is often where those tools live.

When changing grammar, Argument Clinic blocks, opcode definitions, or generated metadata, check the relevant tool workflow before editing generated output.

3.12 Doc/: Documentation Source

Doc/ contains CPython’s documentation source.

The documentation covers:

language reference
library reference
C API reference
extending and embedding
how-to guides
tutorial
installing and using Python

For internals work, the documentation matters because changes to behavior often require documentation updates.

A CPython change may require edits in several places:

implementation code
tests
documentation
news entry
C API docs
library docs

Documentation source uses reStructuredText rather than Markdown.

3.13 Platform Directories

CPython supports many platforms. Some directories exist mainly for platform-specific configuration and builds.

Directory Platform role
PC/ Windows-specific source/config
PCbuild/ Visual Studio build files
Mac/ macOS support
platform files in Python/ and Modules/ OS-specific implementations

Platform support appears throughout the tree through conditional compilation.

Example pattern:

#ifdef MS_WINDOWS
    /* Windows-specific code */
#else
    /* POSIX-like code */
#endif

Internals reading often requires distinguishing portable runtime logic from platform-specific branches.

3.14 Generated Code and Source Inputs

CPython contains both hand-written files and generated files.

Common generated areas include:

parser output from grammar
AST-related generated files
opcode metadata and bytecode tables
Argument Clinic output
frozen importlib modules
configuration files

A generated file usually has a comment near the top explaining how it was produced.

Before editing a suspiciously mechanical block, look for markers such as:

generated by
do not edit
clinic start generated code
autogenerated

Manual edits to generated regions are usually lost during regeneration.

3.15 A Reading Path Through the Repository

A good first reading path is:

Programs/python.c
    ↓
Python/pylifecycle.c
    ↓
Python/pythonrun.c
    ↓
Python/compile.c
    ↓
Python/ceval.c
    ↓
Objects/object.c
    ↓
Objects/typeobject.c
    ↓
Objects/dictobject.c
    ↓
Objects/listobject.c

This path follows execution from process startup to runtime behavior.

A second reading path for object internals:

Include/object.h
    ↓
Include/cpython/object.h
    ↓
Objects/object.c
    ↓
Objects/typeobject.c
    ↓
Objects/longobject.c
    ↓
Objects/unicodeobject.c
    ↓
Objects/listobject.c
    ↓
Objects/dictobject.c

A third path for source-to-bytecode:

Grammar/
    ↓
Parser/
    ↓
Python/ast.c
    ↓
Python/symtable.c
    ↓
Python/compile.c
    ↓
Objects/codeobject.c
    ↓
Python/ceval.c

3.16 How to Find Code for a Python Feature

Start from the Python feature, then map it to a subsystem.

Feature First files to inspect
list.append Objects/listobject.c
dict[key] Objects/dictobject.c
x.y Objects/object.c, Objects/typeobject.c
class C: Objects/typeobject.c, Python/compile.c
try/except Python/compile.c, Python/ceval.c
import x Lib/importlib/, Python/import.c
async def Python/compile.c, Objects/genobject.c
with Python/compile.c, Python/ceval.c
len(x) Python/bltinmodule.c, type slots
print(x) Python/bltinmodule.c, file I/O modules

Use tests to confirm behavior:

./python -m test test_dict
./python -m test test_descr
./python -m test test_importlib

3.17 Repository Layout as Architecture

The directory structure reflects CPython’s architecture.

Include/    exposes C interfaces
Objects/    defines runtime values
Python/     executes and manages programs
Parser/     understands syntax
Modules/    provides C-backed modules
Lib/        provides Python-level standard library
Lib/test/   protects behavior
Programs/   starts the executable
Tools/      maintains generated and developer workflows
Doc/        explains public behavior

This layout is not perfect. Older code, platform constraints, backward compatibility, and generated files create exceptions. Still, the structure is coherent enough to guide source reading.

3.18 Chapter Summary

The CPython repository is a working system, not a collection of isolated files. Objects/ defines what Python values are. Python/ defines how programs compile and execute. Parser/ and Grammar/ define syntax handling. Modules/ and Lib/ provide the standard library. Include/ exposes C interfaces. Lib/test/ defines much of the regression safety net.

A productive reader moves between implementation, tests, and documentation. CPython internals become easier once each source file has a clear place in the runtime architecture.