Skip to content
Performance

Performance

This page records benchmark results from the v0.12.x performance cycle. Numbers are updated each release cycle. All measurements are from the Go benchmark suite in benchmarks/ — see the v0.12.x performance post for the full story behind each number.

Machine: Apple M4, macOS, Go 1.26.


Package installation

go test -bench=BenchmarkInstall -benchmem -benchtime=3s -count=5 ./benchmarks/
Benchmarkv0.12.1 (sequential)v0.12.2 (parallel pool)Improvement
Install 47 wheels (warm cache)~65 ms~51 ms−22%

The parallel pool uses min(len(pins), GOMAXPROCS*2) workers. The gain is larger on a slow disk or network where the sequential path serialised all I/O latency.


Package resolution

go test -bench=BenchmarkPMLock -benchmem -benchtime=3s -count=5 ./benchmarks/
Benchmarkv0.12.1v0.12.3 (prefetch)v0.12.4 (concurrency ×4)
pm lock, 47 pkgs, warm (fixture)~14 ms~11 ms~11 ms
pm lock, 47 pkgs, cold (real PyPI)~4.8 s~4.2 s~3.1 s

The fixture benchmark uses an in-process index (no network). It measures pure resolver overhead. The cold-cache number uses a real PyPI request and reflects network latency; your numbers will differ by connection speed.


Test runner

go test -bench=BenchmarkTestRunner -benchmem -benchtime=3s -count=5 ./benchmarks/
Benchmarkv0.12.1 (unbounded)v0.12.5 (bounded pool)
RunParallel, 100 test files~14 ms~14 ms

Throughput is the same on a machine with sufficient RAM. The improvement from B-4 is peak goroutine count and GC pressure: the unbounded implementation launched one goroutine per file; the bounded pool holds at GOMAXPROCS*2. On a 200-file suite on a 4-core machine the difference is 200 live interpreter allocations versus 8.


Build cache

go test -bench=BenchmarkBuild -benchmem -benchtime=3s -count=3 ./benchmarks/
BenchmarkTime
BenchmarkBuild_CacheMiss (full build, tiny fixture)~14 ms
BenchmarkBuild_CacheHit (cache hit, tiny fixture)~8 ms
BenchmarkCheckCache_Hit (Go-level hash check, 10 files)~55 µs

On a real project where file collection, minification, and zip writing dominate, the cache hit path reduces second-build time to ~55 µs of hash checks regardless of project size. The remaining ~8 ms in the CLI benchmark is process startup.


Startup

go test -bench=BenchmarkStartup -benchmem -benchtime=3s -count=3 ./benchmarks/
Benchmarkv0.12.1v0.12.8
BenchmarkStartup (run test fixture file)~8 ms~8 ms
BenchmarkStartup_InlinePass (bunpy -c "pass")~7.2 ms

bunpy -c "pass" at ~7.2 ms is inside the 10 ms target and below CPython 3.14’s 14 ms cold start on M-series hardware. The lazy module loading in v0.12.8 skips all 40+ bunpy.* factory calls for scripts that never import bunpy. The remaining startup cost is Go runtime init and goipy.New().


Running the benchmarks yourself

# Generate fixtures once
go run ./benchmarks/fixtures/build_fixtures.go

# Run all benchmarks
go test -bench=. -benchmem -benchtime=3s -count=3 ./benchmarks/

# Run a specific benchmark
go test -bench=BenchmarkStartup -benchmem -benchtime=5s -count=5 ./benchmarks/

# Cross-tool comparison (bunpy vs uv vs CPython)
go test -bench=. -benchmem -benchtime=3s -count=3 ./benchmarks/compare/

The scripts/bench.sh script runs all benchmarks and writes a snapshot to benchmarks/baseline.txt.


Environment variables that affect performance

VariableEffect
BUNPY_TEST_PARALLELISM=NOverride test runner worker count (default: GOMAXPROCS*2)
BUNPY_PYPI_CONCURRENCY=NOverride PyPI page fetch concurrency (default: 16)
BUNPY_PYPI_INDEX_URL=urlUse an alternate PyPI index (e.g. a local mirror)
BUNPY_DEBUG=http2Log HTTP/2 negotiation for each PyPI connection
BUNPY_PROFILE_STARTUP=1Write a pprof CPU profile to /tmp/bunpy-startup.pprof
BUNPY_STARTUP_PPROF=pathOverride the pprof output path