Revisiting Knuth's "Premature Optimization" Paper

https://probablydance.com/2025/06/19/revisiting-knuths-premature-optimization-paper/

66•signa11•3d ago

Comments

mjd•5h ago

This is my all-time favorite paper. It's so easy to read, and there's so much to think about, so much that still applies to everyday programming and language dedign.

Also there's Knuth admitting he avoids GO TO because he is afraid of being scolded by Edsger Dijkstra.

https://pic.plover.com/knuth-GOTO.pdf

apricot•4h ago

Reading Knuth is always a pleasure.

From the paper:

"It is clearly better to write programs in a language that reveals the control structure, even if we are intimately conscious of the hardware at each step; and therefore I will be discussing a structured assembly language called PL/MIX in the fifth volume of The art of computer programming"

Looking forward to that!

subharmonicon•4h ago

Love this paper and read it several times, most recently around 10 years ago when thinking about whether there were looping constructs missing from popular programming languages.

I have made the same point several times online and in person that the famous quote is misunderstood and often suggest people take the time to go back to the source and read it since it’s a wonderful read.

wewewedxfgdf•3h ago

It's no longer relevant. It was written when people were writing IBM operating systems in assembly language.

Things have changed.

Remember this: "speed is a feature"?

If you need fast softwqare to make it appealing then make it fast.

hinkley•1h ago

I will say I have worked on projects where sales and upper management were both saying the same thing: customers don't like that our product is slow(er than a competitors), and the devs just shrug and say we've already done everything we can. In one case the most senior devs even came up charts to prove this was all they were going to get.

Somehow I found 40%. Some of it was clever, a lot of it was paying closer attention to the numbers, but most of it was grunt work.

Besides the mechanical sympathy, the other two main tools were 1) stubbornness, and 2) figuring out how to group changes along functional testing boundaries, so that you can justify making someone test a change that only improves perf by 1.2% because they're testing a raft of related changes that add up to more than 10%.

Most code has orphaned performance improvements in 100 little places that all account for half of the runtime because nobody can ever justify going in and fixing them. And those can also make parallelism seem unpalatable due to Amdahl.

godelski•3h ago

I think the problem with the quote is that everyone forgets the line that comes after it.

  We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

  vvvvvvvvvv
  Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.
  ^^^^^^^^^^

This makes it clear, in context, that Knuth defines "Premature Optimization" as "optimizing before you profile your code"

@OP, I think you should lead with this. I think it gets lost by the time you actually reference it. If I can suggest, place the second paragraph after

  > People always use this quote wrong, and to get a feeling for that we just have to look at the original paper, and the context in which it was written.

The optimization part gets lost in the middle and this, I think, could help provide a better hook to those who aren't going to read the whole thing. Which I think how you wrote works good for that but the point (IMO) will be missed by more inattentive readers. The post is good also, so this is just a minor critique because I want to see it do better.

https://dl.acm.org/doi/10.1145/356635.356640 (alt) https://sci-hub.se/10.1145/356635.356640

Swizec•3h ago

Amdahl’s Law is the single best thing I learned in 4 years of university. It sounds obvious when spelled out but it blew my mind.

No amount of parallelization will make your program faster than the slowest non-parallelizable path. You can be as clever as you want and it won’t matter squat unless you fix the bottleneck.

This extends to all types of optimization and even teamwork. Just make the slowest part faster. Really.

https://en.wikipedia.org/wiki/Amdahl%27s_law

godelski•2h ago

  > It sounds obvious when spelled out but it blew my mind.

I think there's a weird thing that happens with stuff like this. Cliches are a good example, and I'll propose an alternative definition to them.

  A cliche is a phrase that's so obvious everyone innately knows or understands it; yet, it is so obvious no one internalizes it, forcing the phrase to be used ad nauseam

At least, it works for a subset of cliches. Like "road to hell," "read between the lines," Goodheart's Law, and I think even Amdahl's Law fits (though certainly not others. e.g. some are bastardized, like Premature Optimization or "blood is thicker than water"). Essentially they are "easier said than done," so require system 2 thinking to resolve but we act like system 1 will catch them.

Like Amdahl's Law, I think many of these take a surprising amount of work to prove despite the result sounding so obvious. The big question is if it was obvious a priori or only post hoc. We often confuse the two, getting us into trouble. I don't think the genius of the statement hits unless you really dig down into proving it and trying to make your measurements in a nontrivially complex parallel program. I think that's true about a lot of things we take for granted

chinchilla2020•1h ago

another commonly misinterpreted one is the `shouting fire in a crowded theatre` quote.

In it's original context it means the opposite of how people use it today.

hinkley•2h ago

> faster than the slowest non-parallelizable path

Rather, than the slowest non-parallelized path. Ultimately you may reach maximum speed on that path but the assumptions that we are close to it often turn out to be poorly considered, or considered before 8 other engineers added bug fixes and features to that code.

From a performance standpoint you need to challenge all of those assumptions. Re-ask all of those questions. Why is this part single threaded? Does the whole thing need to be single threaded? What about in the middle here? Can we rearrange this work? Maybe by adding an intermediate state?

ilc•1h ago

Interestingly, you didn't learn the full lesson:

When optimizing, always consider the cost of doing the optimization vs. it's impact.

In a project where you are looking a 45/30/25 type split. The 45 may actually be well optimized, so the real gains may be in the 30 or 25.

The key is to understand the impact you CAN have, and what the business value of that impact is. :)

The other rule I've learned is: There is always a slowest path.

Swizec•59m ago

> The key is to understand the impact you CAN have, and what the business value of that impact is. :)

Like I tell everyone in system design interviews: AWS will rent you a machine with 32TB of RAM. Are you still sure about all this extra complexity?

godelski•6m ago

I didn't get that impression from their response. I mean I could be wrong, but in context of "use a profiler" I don't think anything you said runs counter. I think it adds additional information, and it's worth stating explicitly, but given my read yours comes off as unnecessarily hostile. I think we're all on the same page, so let's make sure we're on the same side because we have the same common enemy: those who use Knuth's quote to justify the slowest piece of Lovecraftian spaghetti and duct tape imaginable

chinchilla2020•1h ago

There is more to it than that.

1. Decide if optimization is even necessary.

2. Then optimize the slowest path

nurettin•54m ago

There is also the "death by a thousand cuts" kind of slowness that accrues over time and it doesn't really matter where you start peeling the onion and the part you started is rarely the best.

hinkley•2h ago

It is exactly this "lulled into complacency" that I rail against when most people cite that line. Far too many people are trying to shut down down dialog on improving code (not just performance) and they're not above Appeal to Authority in order to deflect.

"Curiosity killed the cat, but satisfaction brought it back." Is practically on the same level.

If you're careful to exclude creeping featurism and architectural astronautics from the definition of 'optimization', then very few people I've seen be warned off of digging into that sort of work actually needed to be reined in. YAGNI covers a lot of those situations, and generally with fewer false positives. Still false positives though. In large part because people disagree on what "The last responsible moment" in part because our estimates are always off by 2x, so by the time we agree to work on things we've waited about twice as long as we should have and now it's all half assed. Irresponsible.

bluGill•3h ago

If you have benchmarked something then optimizations are not premature. People often used to 'optimize' code that was rarely run, often making the code harder to read for no gain.

beware too of premature pessimization. Don't write bubble sort just because you haven't benchmarked your code to show it is a bottleneck - which is what some will do and then incorrectly cite premature optimization when you tell them they should do better. Note that any compitenet languare has sort in the standard library that is better than bubble sort.

yubblegum•3h ago

Ironically, all pdfs of the famous paper have atrocious typesetting and are a pain to read.

layer8•3h ago

> Usually people say “premature optimization is the root of all evil” to say “small optimizations are not worth it” but […] Instead what matters is whether you benchmarked your code and whether you determined that this optimization actually makes a difference to the runtime of the program.

In my experience the latter is actually often expressed. What else would “premature” mean, other than you don’t know yet whether the optimization is worth it?

The disagreement is usually more about small inefficiencies that may compound in the large but whose combined effects are difficult to assess, compiler/platform/environment-dependent optimizations that may be pessimizations elsewhere, reasoning about asymptotic runtime (which shouldn’t require benchmarking — but with cache locality effects sometimes it does), the validity of microbenchmarks, and so on.

parpfish•3h ago

The way I often hear it expressed has nothing to do with small efficiency changes or benchmarking and it’s more of a yagni/anticipating hyper scale issue. For example, adding some complexity to your code so it can efficiently handle a million users when you’re just fine writing the simple to read and write version for hat isn’t optimal but will work just fine for the twenty users you actually have.

userbinator•3h ago

I understood it to mean that optimising for speed at the expense of size is a bad idea unless there are extremely obvious performance improvements in doing so. By default you should always optimise for size.

dan-robertson•3h ago

I like this article. It’s easy to forget what these classic CS papers were about, and I think that leads to poorly applying them today. Premature optimisation of the kind of code discussed by the paper (counting instructions for some small loop) does indeed seem like a bad place to put optimisation efforts without a good reason, but I often see this premature optimisation quote used to:

- argue against thinking about any kind of design choice for performance reasons, eg the data structure decisions suggested in this article

- argue for a ‘fix it later’ approach to systems design. I think for lots of systems you have some ideas for how you would like them to perform, and you could, if you thought about it, often tell that some designs would never meet them, but instead you go ahead with some simple idea that handles the semantics without the performance only to discover that it is very hard to ‘optimise’ later.

godelski•1h ago

  > a ‘fix it later’ approach

Oh man, I hate how often this is used. Everyone knows there's nothing more permanent than a temporary fix lol.

But what I think people don't realize is that this is exactly what tech debt is. You're moving fast but doing so makes you slow once we are no longer working in a very short timeline. That's because these issues compound. Not only do we repeat that same mistake, but we're building on top of shaky ground. So to go back and fix things ends up requiring far more effort than it would have taken to fix it early. Which by fixing early your efforts similarly compound, but this time benefiting you.

I think a good example of this is when you see people rewrite a codebase. You'll see headlines like "by switching to rust we got a 500% improvement!" Most of that isn't rust, most of that is better algorithms and design.

Of course, you can't always write your best code. There's practical constraints and no code can be perfect. But I think Knuth's advice still fits today, despite a very different audience. He was talking to people who were too obsessed with optimization while today were overly obsessed with quickly getting to some checkpoint. But the advice is the same "use a fucking profiler". That's how you find the balance and know what actually can be put off till later. It's the only way you can do this in an informed way. Yet, when was the last time you saw someone pull out a profiler? I'm betting the vast majority of HN users can't remember and I'd wager a good number never have

osigurdson•2h ago

The real root of all evil is reasoning by unexamined phrases.

hinkley•1h ago

"A clever saying proves nothing." - Voltaire

osigurdson•2h ago

It is generally better just to focus on algorithmic complexity - O(xN^k). The first version of the code should bring code to the lowest possible k (unless N is very small then who cares). Worry about x later. Don't even think about parallelizing until k is minimized. Vectorize before parallelizing.

For parallel code, you basically have to know in advance that it is needed. You can't normally just take a big stateful / mutable codebase and throw some cores at it.

monkeyelite•2h ago

- Knuth puts sentinels at the end of an array to avoid having to bounds check in the search. - Knuth uses the register keyword. - Knuth custom writes each data structure for each application.

hinkley•1h ago

When I was young someone pointed out to me how each hype cycle we cherry-pick a couple interesting bits out of the AI space, name them something else respectable, and then dump the rest of AI on the side of the road.

When I got a bit older I realized people were doing this with performance as well. We just call this part architecture, and that part Best Practices.

ethan_smith•43m ago

The famous "premature optimization" quote isn't from a dedicated paper on optimization, but from Knuth's 1974 "Structured Programming with go to Statements" paper where he was discussing broader programming methodology.

I made my VM think it has a CPU fan

Bitcoin's Security Budget Issue: Problems, Solutions and Myths Debunked

Ask HN: What Are You Working On? (June 2025)

The Book of Shaders

Cell Towers Can Double as Cheap Radar Systems for Ports and Harbors (2014)

Revisiting Knuth's "Premature Optimization" Paper

Modelling API rate limits as diophantine inequalities

Nearly 20% of cancer drugs defective in 4 African nations

Show HN: Octelium – FOSS Alternative to Teleport, Cloudflare, Tailscale, Ngrok

Finding a former Australian prime minister’s passport number on Instagram (2020)

WorldVLA: Towards Autoregressive Action World Model

We accidentally solved robotics by watching 1M hours of YouTube

4-10x faster in-process pub/sub for Go

Bloom Filters by Example

Many ransomware strains will abort if they detect a Russian keyboard installed (2021)

Using the Internet without IPv4 connectivity

Error handling in Rust

Commodore acquired for a 'low seven figure' price – CEO from retro community

Anticheat Update Tracking

The Medley Interlisp Project: Reviving a Historical Software System [pdf]

China Dominates 44% of Visible Fishing Activity Worldwide

Loss of key US satellite data could send hurricane forecasting back 'decades'

The $25k car is going extinct?

Show HN: Rust -> WASM, K-Means Color Quantization Crate for Image-to-Pixel-Art

The Unsustainability of Moore's Law

Oldest boomerang doesn't come back

Raymond Laflamme (1960-2025)

Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)

More on Apple's Trust-Eroding 'F1 the Movie' Wallet Ad

Reverse Engineering the Microchip CLB

I made my VM think it has a CPU fan

Bitcoin's Security Budget Issue: Problems, Solutions and Myths Debunked

Ask HN: What Are You Working On? (June 2025)

The Book of Shaders

Cell Towers Can Double as Cheap Radar Systems for Ports and Harbors (2014)

Revisiting Knuth's "Premature Optimization" Paper

Modelling API rate limits as diophantine inequalities

Nearly 20% of cancer drugs defective in 4 African nations

Show HN: Octelium – FOSS Alternative to Teleport, Cloudflare, Tailscale, Ngrok

Finding a former Australian prime minister’s passport number on Instagram (2020)

WorldVLA: Towards Autoregressive Action World Model

We accidentally solved robotics by watching 1M hours of YouTube

4-10x faster in-process pub/sub for Go

Bloom Filters by Example

Many ransomware strains will abort if they detect a Russian keyboard installed (2021)

Using the Internet without IPv4 connectivity

Error handling in Rust

Commodore acquired for a 'low seven figure' price – CEO from retro community

Anticheat Update Tracking

The Medley Interlisp Project: Reviving a Historical Software System [pdf]

China Dominates 44% of Visible Fishing Activity Worldwide

Loss of key US satellite data could send hurricane forecasting back 'decades'

The $25k car is going extinct?

Show HN: Rust -> WASM, K-Means Color Quantization Crate for Image-to-Pixel-Art

The Unsustainability of Moore's Law

Oldest boomerang doesn't come back

Raymond Laflamme (1960-2025)

Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)

More on Apple's Trust-Eroding 'F1 the Movie' Wallet Ad

Reverse Engineering the Microchip CLB

Revisiting Knuth's "Premature Optimization" Paper

Comments