82. Running the Test Suite
python -m test flags, regrtest test selection, parallel execution (-j), and interpreting test output.
82. Running the Test Suite
The CPython test suite is one of the most important parts of the project. It protects language semantics, runtime behavior, standard library correctness, ABI stability, memory management invariants, and platform compatibility across operating systems and architectures.
A CPython contributor spends a large amount of time inside the test system. Even small changes to parser logic, object lifetime handling, reference counting, imports, or threading can break unrelated parts of the runtime. The test suite exists to detect those regressions early.
This chapter explains how the CPython test infrastructure works, how tests are organized, how to execute subsets of tests efficiently, and how core developers use the suite during development.
82.1 Purpose of the Test Suite
The CPython test suite serves several independent purposes.
| Purpose | Description |
|---|---|
| Correctness | Verify language and library behavior |
| Regression prevention | Prevent old bugs from reappearing |
| Platform validation | Ensure behavior across Linux, macOS, Windows, BSD, mobile, embedded targets |
| Memory safety | Detect leaks, corruption, dangling references |
| Concurrency validation | Test thread safety and signal handling |
| API compatibility | Protect the C API and ABI |
| Performance stability | Detect pathological slowdowns |
| Build validation | Verify generated artifacts and extension modules |
CPython evolves continuously. A change in one subsystem often affects another subsystem indirectly.
For example:
parser change
→ compiler output changes
→ bytecode layout changes
→ traceback formatting changes
→ debugger tests fail
The test suite exists to make those interactions visible.
82.2 Repository Layout
Most tests live in:
Lib/test/
This directory contains thousands of files.
Important directories include:
| Path | Purpose |
|---|---|
Lib/test/ |
Core regression tests |
Lib/test/test_* |
Standard test modules |
Lib/test/support/ |
Shared utilities |
Modules/_test* |
Native test extensions |
Python/ |
Runtime-level tests for core systems |
Tools/ |
Developer tools and helpers |
Example:
Lib/test/test_dict.py
Lib/test/test_list.py
Lib/test/test_gc.py
Lib/test/test_asyncio/
Lib/test/test_importlib/
The naming convention is usually:
test_<feature>.py
Each file focuses on one subsystem or module.
82.3 The regrtest Framework
CPython uses a custom test runner called regrtest.
You usually invoke it through:
python -m test
or:
./python -m test
when using a locally built interpreter.
The entry point lives in:
Lib/test/libregrtest/
regrtest handles:
test discovery
parallel execution
timeouts
resource management
reference leak checks
test isolation
reruns
randomization
output formatting
platform skipping
This framework evolved specifically for CPython’s needs. Generic Python test runners are insufficient for many interpreter-level tasks.
82.4 Building Before Running Tests
You typically run tests against a locally built interpreter.
Example:
./configure --with-pydebug
make -j8
Then:
./python -m test
Using the system Python is usually incorrect when modifying CPython internals because:
wrong binary
wrong stdlib
wrong extension modules
wrong bytecode format
wrong ABI
The local build ensures tests execute against the modified runtime.
82.5 Running the Entire Test Suite
To execute the full suite:
./python -m test
This may take a long time depending on hardware and build type.
Typical execution includes:
thousands of test files
tens of thousands of test cases
subprocess spawning
network simulation
filesystem operations
thread scheduling
extension module loading
Parallel execution is common:
./python -m test -j8
where:
-j8
runs eight worker processes.
CPython tests are generally process-isolated rather than thread-isolated.
82.6 Running Individual Tests
During development, you rarely run the full suite repeatedly.
Instead:
./python -m test test_dict
or:
./python -m test test_gc
You can also run multiple tests:
./python -m test test_dict test_list test_set
This workflow is essential for fast iteration.
Example development loop:
edit source
rebuild CPython
run focused tests
inspect failure
repeat
82.7 Verbose Output
Verbose mode:
./python -m test -v test_dict
shows individual test cases as they execute.
Useful for:
debugging hangs
tracking failures
observing execution order
finding flaky tests
Very verbose mode:
./python -m test -vv
prints even more internal details.
82.8 Fail Fast Mode
During debugging:
./python -m test -x
stops at the first failure.
This reduces noise when diagnosing a regression.
Combined example:
./python -m test -v -x test_gc
82.9 Rerunning Failed Tests
Useful option:
--rerun
Example:
./python -m test --rerun
This reruns previously failed tests.
Helpful for:
flaky failures
intermittent race conditions
long test sessions
incremental debugging
82.10 Test Discovery
regrtest discovers tests dynamically.
Convention:
test_*.py
Classes usually inherit from:
unittest.TestCase
Example:
import unittest
class DictTests(unittest.TestCase):
def test_lookup(self):
d = {"x": 1}
self.assertEqual(d["x"], 1)
Discovery scans the test package and imports matching modules.
82.11 Test Isolation
Isolation is critical.
Many tests modify:
environment variables
working directories
signal handlers
sys.modules
thread state
filesystem state
locale settings
warning filters
CPython’s test infrastructure attempts to restore interpreter state after each test.
Utilities in:
test.support
provide helpers for isolation.
Example:
from test.support import EnvironmentVarGuard
This prevents global state contamination across tests.
82.12 Temporary Directories and Files
Tests should avoid polluting the filesystem.
Typical pattern:
import tempfile
with tempfile.TemporaryDirectory() as d:
...
CPython also provides helpers:
from test.support import os_helper
These utilities handle platform-specific cleanup issues.
82.13 Skipped Tests
Some tests require optional features:
network access
IPv6
large memory
GUI support
SSL
specific OS behavior
Tests can skip dynamically:
import unittest
@unittest.skipUnless(condition, "requires feature")
def test_feature():
...
or:
self.skipTest("reason")
Skipping is common in CPython because supported platforms vary widely.
82.14 Resource-Intensive Tests
Some tests are disabled by default.
Examples:
network tests
large file tests
CPU-intensive tests
memory-heavy tests
Enable them with:
./python -m test -u all
or specific resources:
./python -m test -u network
Resource categories include:
| Resource | Meaning |
|---|---|
network |
Internet access |
largefile |
Very large files |
audio |
Audio devices |
gui |
GUI interaction |
cpu |
Expensive CPU workloads |
This prevents accidental long-running executions.
82.15 Reference Leak Testing
One of the most important CPython-specific features is reference leak detection.
Debug builds support:
./python -m test -R 3:3 test_dict
Meaning:
warmup runs
measured runs
compare reference counts
This detects leaked references caused by incorrect Py_INCREF or Py_DECREF usage.
Example leak source:
PyObject *x = PyLong_FromLong(1);
return x;
without a matching decref in some path.
Reference leaks are critical because CPython relies heavily on deterministic reference counting.
82.16 Debug Builds
Many tests are meaningful only under debug builds.
Configure:
./configure --with-pydebug
Debug builds enable:
extra assertions
memory poisoning
reference tracking
debug allocators
interpreter consistency checks
Debug builds are slower but substantially more informative.
Typical debug-only checks include:
negative refcounts
invalid GC state
object lifecycle corruption
API misuse
82.17 Memory Allocator Debugging
CPython has specialized allocator diagnostics.
Environment variables:
PYTHONMALLOC=debug
can detect:
buffer overflows
double frees
invalid memory access
allocator misuse
Combined with tests, these tools expose subtle runtime bugs.
82.18 Running Tests Under Sanitizers
Advanced debugging often uses compiler sanitizers.
Examples:
CFLAGS="-fsanitize=address"
or:
CFLAGS="-fsanitize=undefined"
Sanitizers help detect:
heap corruption
use-after-free
integer overflow
undefined behavior
stack corruption
These tools are extremely valuable for C-level interpreter work.
82.19 Parallel Testing
Parallel execution:
./python -m test -j0
uses all CPU cores automatically.
Internally, regrtest spawns worker processes.
Benefits:
faster CI
better CPU utilization
reduced wall-clock time
Challenges:
race conditions
filesystem contention
port conflicts
test order assumptions
Flaky tests often appear first under parallel execution.
82.20 Flaky Tests
A flaky test passes sometimes and fails sometimes.
Common causes:
timing assumptions
thread scheduling
signal races
network instability
clock precision
resource exhaustion
platform variance
CPython developers treat flaky tests seriously because they reduce CI reliability.
Strategies include:
timeouts
retry loops
stronger synchronization
reduced timing assumptions
process isolation
82.21 Platform-Specific Behavior
CPython supports many platforms.
Tests often include conditional branches:
import sys
if sys.platform == "win32":
...
or:
import unittest
@unittest.skipIf(sys.platform == "win32", "POSIX only")
Platform differences include:
filesystem semantics
path handling
signals
process APIs
thread scheduling
encoding defaults
socket behavior
A test passing on Linux does not guarantee correctness on Windows or macOS.
82.22 Running Tests After Bytecode Changes
Compiler or interpreter modifications often invalidate .pyc files.
Common workflow:
make clean
make
or manually removing:
__pycache__/
Incorrect bytecode caches can produce misleading failures.
82.23 Test Support Utilities
Lib/test/support/ contains many helpers.
Examples:
| Utility | Purpose |
|---|---|
import_helper |
Import isolation |
threading_helper |
Thread coordination |
socket_helper |
Network helpers |
warnings_helper |
Warning capture |
os_helper |
Filesystem helpers |
These utilities reduce duplicated infrastructure across tests.
82.24 Continuous Integration
CPython uses extensive CI infrastructure.
Typical CI runs include:
Linux
Windows
macOS
debug builds
release builds
sanitizer builds
free-threaded builds
multiple architectures
A patch may pass locally but fail in CI due to platform-specific behavior.
Core developers therefore rely heavily on automated infrastructure.
82.25 Common Contributor Workflow
Typical workflow:
modify source
build interpreter
run focused tests
run broader related tests
run full suite if necessary
check reference leaks
push branch
wait for CI
investigate failures
Small targeted testing is essential for productivity.
Running the full suite after every edit is usually impractical.
82.26 Writing Good Tests
Good CPython tests are:
deterministic
isolated
cross-platform
minimal
fast
clear
specific
Bad tests often:
depend on timing
depend on execution order
leave global state modified
depend on external services
assume specific memory layout
A good regression test usually targets one bug precisely.
82.27 Example Minimal Regression Test
Suppose a dictionary regression existed.
A focused test might look like:
import unittest
class DictRegressionTests(unittest.TestCase):
def test_resize_preserves_entries(self):
d = {}
for i in range(1000):
d[i] = i
for i in range(1000):
self.assertEqual(d[i], i)
if __name__ == "__main__":
unittest.main()
This directly validates the invariant under investigation.
82.28 Reading Failures
A test failure may indicate:
logic bug
memory corruption
reference leak
ABI mismatch
undefined behavior
test bug
platform assumption
CPython failures are sometimes nonlocal.
Example:
GC corruption
→ unrelated test crashes later
The first visible failure is not always the root cause.
82.29 Core Principle
The CPython test suite is part of the interpreter itself.
It is not an optional accessory.
The runtime, compiler, object system, import machinery, and standard library evolve together with the tests. Every important subsystem in CPython is tightly coupled to regression validation.
Understanding the test infrastructure is therefore a prerequisite for serious CPython development.