Making PyPI's test suite faster

https://blog.trailofbits.com/2025/05/01/making-pypis-test-suite-81-faster/

125•rbanffy•9mo ago

Comments

ustad•9mo ago

The article uses pytest - does anyone have similar tips when using pythons builtin unittest?

masklinn•9mo ago

The sys.monitoring and import optimisation suggestions apply as-is.

If you use standard unittest discovery the third item might apply as well, though probably not to the same degree.

I don’t think unittest has any support for distribution so the xdist stuff is a no.

On the other hand you could use unit test as the API with Pytest as your test runner. Then you can also use xdist. And eventually migrate to the Pytest test api because it’s so much better.

kinow•9mo ago

I wwsn't familiar with this sys.monitoring option for coverage. Going to give it a try in my test suite. At the moment with docker testcontainers, gh actions test matrix for multiple python versions, and unit + regression + integration tests it is taking about 3-5 minutes.

darkamaul•9mo ago

Warning, there is a change in coverage 7.7.0 that disables sysmon support for coverage if using branch coverage _and_ a version of Python before 3.14alpha6.

[0]: https://coverage.readthedocs.io/en/7.8.0/changes.html#versio...

kinow•9mo ago

Ah, thank you! I think you just saved me some time!

anticodon•9mo ago

I profiled a huge legacy tests collection using cProfile, and found lots of low hanging fruits. Like some tests were creating 4000x3000 Pillow image in memory just to test how image saving code works (checkign that filename and extension are correct). And hundreds of tests had created this huge image for every test (in the setUp method) because of unittest reliance on inheritance. Reducing size image to 10x5 made the test suite faster for like 5-7% (it was long time ago, so I don't remember exact statistics).

So, I'd run the tests under cProfile first.

dmurray•9mo ago

But the changes in TFA were of the other of 75% improvement for "dumb" changes that were agnostic to the details of the tests being run.

Saying you got a 5-7% improvement from a single change, discovered using the profiler, that took understanding of the test suite and the domain to establish it was OK, and that actually changed the functionality under test - that's all an argument for doing exactly the opposite of what you recommend.

anticodon•9mo ago

> that actually changed the functionality under test - that's all an argument for doing exactly the opposite of what you recommend.

It was an old functionality. Someone wrote a super class that for the need of testing filesystem functionality created extremely large images. Not only there was no need to test with such large images, other developers eventually inherited more testcases from that setup code (because there were other utility methods), and now setUp code was needlessly creating images that no test used.

Generating a huge 4k image takes a significant time using Pillow.

cocoflunchy•9mo ago

I don't understand why pytest's collection is so slow.

On our test suite (big django app) it takes about 15s to collect tests. So much that we added a util using ripgrep to find the file and pass it as an argument to pytest when using `pytest -k <testname>`.

piokoch•9mo ago

Ehhh, those pesky Python people, complaining and complaining, average Spring Boot application takes 15s to start even looking if the code compiled ;)

thom•9mo ago

Lest we start to malign the JVM as a whole, my Clojure test suite, which includes functional tests running headless browsers against a full app hitting real Postgres databases, runs end to end in 20s.

ffsm8•9mo ago

The spring tests are generally quicker then the equivalent python test, so ime - the jvm is mostly to blame.

How much time actually goes by after you click "run test" (or run the equivalent cli command) until the test finished running?

Any projects using the jvm I've ever worked on (none of which were clojure, admittedly) have always taken at least 10-15s until the pre-phases were finished and the actual test setup began

thom•9mo ago

If I completely clear all cached packages maybe, but I never do that locally or in CI/CD, and that's true of Python too (but no doubting UV is faster than Maven). Clojure/JVM startup time is less than half a second, obviously that's still infinitely more than Python or a systems language but tolerable to me. First test runs after about 2s? And obviously day to day these things run instantly because they're already loaded in a REPL/IPython. Maybe unfair to compare an interpreted language to a compiled one: building an uberjar would add 10 seconds but I'd never do that during development, which is part of the selling point I guess. Either way, I don't think the JVM startup time is really a massive issue in 2025, and I feel like whatever ecosystem you're in, you can always attack these slow test suites and improve your quality of life.

esafak•9mo ago

It spins up a postgres container in that 20s?

thom•9mo ago

Not a container but yes, it launches a cluster at the start of a run, and copies a blank Postgres template before every relevant test.

kinow•9mo ago

In their case I think they were no specifying any test path. Which would cause pytest to search or tests in multiple directories.

Another thing that can slow down pytest collection and bootstrap is how fixture are loaded. So reducing number or scope of fixtures may help too.

boxed•9mo ago

I've done some work on making pytest faster, and it's mostly a case of death by a thousand paper cuts. I wrote hammett as an experimental benchmark to compare to.

Galanwe•9mo ago

From my experience speeding up pytests with Django:

- Creating and migrating the test DB is slow. There is no shame in storing and committing a premigrated sqlite test DB generated upon release, it's often small in size and will save time for everyone.

- Stash your old migrations that nobody use anymore.

- Use python -X importtime and paste the result in an online viewer. Sometimes moving heavy imports to functions instead of the global scope will make individual tests slower, but collection will be faster.

- Use pytest-xdist

- Disable transactions / rollback on readonly tests. Ideally you want most of your non-inserting tests to work on the migrated/preloaded features in your sqlite DB.

We can enter into more details if you want, but the pre migrated DB + xdist alone allowed me to speedup tests on a huge project from 30m to 1m.

imp0cat•9mo ago

Is there a way to use pytest-xdist and still keep the regular output?

caidan•9mo ago

Agreed, the db migrations are usually the slowest part. Another way to speed this up substantially if you are using postgres and need your test database to be postgres too, is to create and maintain a template database for your tests. This database should have all migrations already run on it and be loaded with whatever general use fixtures you will need. You can then use the Django TEMPLATE setting https://docs.djangoproject.com/en/5.1/ref/settings/#template and Django will clone that database when running your tests.

throwme_123•9mo ago

Is Trail of Bits transitioning out of "crypto"?

Imho, they are one of the best auditors out there for smart contracts. Wouldn't be surprising to see some of these talented teams find bigger markets.

woodruffw•9mo ago

No; Trail of Bits has always had multiple internal groups, including an OSS engineering group that does security and performance engineering. We still do plenty of audits as a company; you can see recent work on that front here[1] :-).

Source: I run the group that produced this work.

[1]: https://github.com/trailofbits/publications

frogsRnice•9mo ago

You all do amazing work, hope I can boast the same someday - or even 50% of it ;)

Seriously, you are my heroes!

frogsRnice•9mo ago

Imo its not just crypto- a lot of their reports are enlightening to read

bsamuels•9mo ago

In addition to what Will posted, published reports for blockchain projects tend to be skewed compared to our other groups.

Blockchain clients tend to want to publish the report, but that isn't true for our business lines/projects/clients that are more interesting to HN's audience.

bgwalter•9mo ago

I get that pytest has features that unittest does not, but how is scanning for test files in a directory considered appropriate for what is called a high security application in the article?

For high security applications the test suite should be boring and straightforward. pytest is full of magic, which makes it so slow.

Python in general has become so complex, informally specified and bug ridden that it only survives because of AI while silencing critics in their bubble.

The complexity includes PSF development processes, which lead to:

https://www.schneier.com/blog/archives/2024/08/leaked-github...

williamdclt•9mo ago

> it only survives because of AI

I don't disagree that it's "complex, informally specified" (idk about bug ridden or silencing critics), but it's just silly to say it only survives because of AI. It was a top-used language before AI got big for web development, data science and all sorts of scientific analysis, and these haven't gone away: I don't expect Python lost much ground in these fields, if any.

bgwalter•9mo ago

Dropbox moved parts from Python to Golang already in 2014. Google fired the Python team last year and I hear that it does not use Python for new code. Instagram is kept afloat by gigantic hacks.

The scientific ecosystem was always there, but relied on heavy marketing to academics, who (sadly) in turn indoctrinate new students to use Python as a first language.

I did forget about sysadmin use cases in Linux distributions, but they could be easily replaced by even Perl, as leaner BSD distributions already do.

guappa•9mo ago

You'd be right if go wasn't an awful language designed by someone who clearly failed their compiler class at university.

westurner•9mo ago

strace is one way to determine how many stat calls a process makes.

Developers avoid refactoring costs by using dependency inversion, fixtures and functional test assertions without OO in the tests, too.

Pytest collection could be made faster with ripgrep and does it even need AST? A thread here mentions how it's possible to prepare a list of .py test files containing functions that start with "test_" to pass to the `pytest -k` option; for example with ripgrep.

One day I did too much work refactoring tests to minimize maintenance burden and wrote myself a functional test runner that captures AssertionErrors and outputs with stdlib only.

It's possible to use unittest.TestCase() assertion methods functionally:

  assert 0 == 1
  # AssertionError

  import unittest
  test = unittest.TestCase()

  test.assertEqual(0, 1)
  # AssertionError: 0 != 1

unittest.TestCase assertion methods have default error messages, but the `assert` keyword does not.

In order to support one file stdlib-only modules, I have mocked pytest.mark.parametrize a number of times.

chmp/ipytest is one way to transform `assert a == b` to `assertEqual(a,b)` like Pytest in Jupyter notebooks.

Python continues to top language use and popularity benchmarks.

Python is not a formally specified language, mostly does not have constant time operations (or documented complexity in docstring attrs), has a stackless variant, supported asynchronous coroutines natively before C++, now has some tail-call optimization in 3.14, now has nogil mode, and is GPU accelerated in many different ways.

How best could they scan for API tokens committed to public repos?

woodruffw•9mo ago

pytest's magic is not itself a significant overhead factor. All test suite systems need to perform a similar type of collection; unittest does the exact same thing via `unittest.main()`.

zahlman•9mo ago

Critics of Python don't get "silenced in their bubble" generally, just ignored.

Critics of the PSF, well, that's another story.

As for complexity, it's not so much that new features are added, but that people are using Python in larger systems, and demanding things to help manage the complexity (that end up adding more complexity of their own). The Zen of Python is forgotten - and that's largely on the users.

pytest is full of magic, but at least it uses that magic to present a pleasant UI. Certainly better than unittest's JUnit-inspired design. But it'd be that much nicer to have something that gets there directly rather than wrapping the bad stuff, and which honours "simple is better than complex" and "explicit is better than implicit" (test discovery, but also fixtures).

bgwalter•9mo ago

> Critics of Python don't get "silenced in their bubble" generally, just ignored.

I disagree. The public bans are just the tip of the iceberg. Here is a relatively undocumented one:

https://lwn.net/Articles/1003436/

It is typical for a variety of reasons. Someone complains about breakage and is banned. Later, when the right people complain about the same issue, the breakage is reverted.

The same pattern happens over and over. The SC and the PSF are irresponsible, incompetent and malicious.

NeutralForest•9mo ago

Pretty good article, it's really a challenge to properly isolate DB operations during testing so having a difference instance per worker is nice. I remember trying to use different schemas (not instances) but I had a hard time to isolate roles as well.

lyu07282•9mo ago

It's more work, but that's one benefit of clean architecture that abstracts the persistence layer. (You can replace it with an in-memory variant.)

NeutralForest•9mo ago

I was using https://www.postgresql.org/docs/current/ddl-rowsecurity.html and needed to check that some complex policies were working correctly so I couldn't just replace with say, SQLite.

boyd•9mo ago

Throwing cores at the problem with `pytest-xdist` is typically the lowest hanging fruit, but you still hit all the paper cuts the authors mention -- collection, DB fixtures, import time, etc.

And, further optimization is really hard when the CI plumbing starts to dominate. For example, the last Warehouse `test` job I checked has 43s of Github Actions overhead for 51s of pytest execution time (half the test action time and approaching 100% overhead).

Disclosure: Have been tinkering on a side project trying to provide 90% of these pytest optimizations automatically, but also get "time-to-first-test-failure" down to ~10 seconds (via warm runners, container snapshotting, etc.). Email in profile if anyone would like to swap notes.

nine_k•9mo ago

One thing not mentioned here is putting your test database on a RAM disk, aka tmpfs. This significantly speeds up all DB-related tests that use transactions, fixture loading, and migrations.

In most distros, /tmp is mounted as tmpfs, but YMMV.

qznc•9mo ago

I generally try to avoid mocking completely. However, speeding up tests is an appropriate use. If someone changes the implementation the mock usually simply doesn't apply and the test still works as intended.

For example, a great speed optimization in our tests recently was to mock time.sleep.

Why do we have so many sleeps? This is testing a test framework for embedded devices where there is plenty of fiddling-then-wait-for-the-hardware.

I also mocked some file system accesses. Unit testing is about our application logic and not about Linux kernel behavior anyways.

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

The source code was the moat. But not anymore

Does anyone else feel like their inbox has become their job?

An AI model that can read and diagnose a brain MRI in seconds

Dev with 5 of experience switched to Rails, what should I be careful about?

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

The source code was the moat. But not anymore

Does anyone else feel like their inbox has become their job?

An AI model that can read and diagnose a brain MRI in seconds

Dev with 5 of experience switched to Rails, what should I be careful about?

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

Making PyPI's test suite faster

Comments