Strategic optimisations is often basically free if you have domain expertise. It's that easy to know that the business wants x outcome and algorithm y is the right choice etc if its all internal thought processes. Whereas if you don't know enough then you're likely to make very expensive to undo decisions.
> There is no way to provide both optimized assembly and equivalent C code and let the compiler use the former in the general case and the latter in special cases.
This is true, but can be seen as a failure of language and tooling. For example, Halide [1] pioneered (AFAIK) the concept of separating algorithm from implementation at the language level. This separation lets you express the algorithm once, and then "schedule" it by specifying parallelism, vectorization, etc. You can provide multiple schedules for one algorithm, which allows you to specialize / make different choices depending on varying factors.
It's a really interesting concept, though maybe limited in practice to DSLs. I'm not sure a general purpose language would be a good fit for this model, but then, for general purpose programs written in general purpose languages, perf optimization at the level TFA discusses is frequently limited to just specific hot sections. Those hot sections could be extracted out into specialized components written in such a DSL.
https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning....
you don't need to go all the way to Halide to do what the article is claiming isn't possible - you can do it just by including a "micro-kernel" in your library and have the code branch to that impl (depending on something at runtime) instead of whatever the C code compiled down to. this is done every single day in every single GPU lib (famously cublas ships with hundreds/thousands of these of such ukernels for gemms depending on shapes).
One of my early-career successes was just creating a framework for generating every permutation of perf optimizations for every (log-scaled -- clz is very fast) input size and checking which was best, dropping the results into a lookup table of function pointers to branch on. The university had a large supply of heterogeneous computers, replete with all the normal problems like being able to double floating-point addition throughput on Haswell CPUs by abusing the fmadd instruction, so I made a framework (probably closer to a DSL) for encoding your algorithms in a way that you could analyze perf tradeoffs at compile time and tune your result for the given computer. It's kind of like what ATLAS does for some linear algebra tasks.
Such practices are almost never optimal, but they're pretty easy to implement, and the results are near-optimal for almost all inputs. In the tradeoff between human and computer performance, I think it's a nice option.
This seems like a nonsensical statement to me. How could measuring be a substitute for thinking/analyzing/predicting/forming a plan?
Measuring/profiling just means observing the system you want to optimize in a systematic way. You certainly won't be very effective at optimizing anything if you don't observe it.
Theoretical calculations means you've formed a model of what's happening and you're devising a plan to optimize against that model. But of course, a useful model needs to represent the significant aspects of your system (and a good model should exclude most insignificant ones). Failing to observe your system means your model could be bad -- focused on insignificant aspects and missing significant ones -- and you'd never know.
Measuring doesn't mean don't think. Measuring and thinking are two different things. You need to do them both to optimize effectively.
The fact remains most projects that do small trivial modular prototypes first will ultimately know which paths are viable before painting themselves into a corner algorithmically.
Best of luck =3
You do profiling because it's WAY too easy to get obsessed about theoretical problems when a simple measurement will show you the actual problems.
You do the math on the actual problem location, not a method with O(n!) which only gets called with n=3.
You still have to look at the entire call stack when profiling (which means thinking about the overarching algorithm).
If a problem area is so intuitively obvious, why would you introduce the problem in the first place? In reality, performance optimizations are usually needed where you least expect them. Which means that you can't get there intuitively. Hence, the suggestion of using profiling to help track down where the problem is instead.
Imagine: function F() { for (i = 0; i < 10; i++) { A(); B(); C(); } }
If we profile this code, we might find out, e.g. B takes the majority of the time--let's say 90%. So you spend hours, days, weeks, making B 2X faster. Great. Now you removed 45% of execution time. But the loop in the outer function F is just a few instructions, it is not "hot"--it won't show up in profiles except for ones that capture stacks.
If you're just stuck in the weeds optimizing hot functions that show up in profiles, it's possible to completely overlook F. That loop might be completely redundant, causing 10X the workload by repeatedly computing A, B, and C, which may don't need to be recomputed.
There are bazillions of examples like this. Say you find out that a function is super, super hot. But it's just a simple function. There are calls to it all over the code. You can't make it any faster. Instead you need to figure out how to not call it at all, e.g. by caching or rethinking the whole algorithm.
> How could measuring be a substitute for thinking/analyzing/predicting/forming a plan?
This happens more than you think. Understanding how the system works in enough detail and also at a high level to formulate a plan is in short supply. Jumping in and hacking in things, like a cache or something, is surprisingly common.
I don't think I've ever used a profiler that couldn't report you were in F() here. One that only captures your innermost functions really doesn't seem that useful, for exactly the reasons you give.
But it's also very easily to mislead yourself that way, many "optimizations" might do much less than you think. So you should avoid implementing more complex or harder to understand code just because you think it is faster, but otherwise I'd certainly try to write faster code by default in areas I know well enough to judge that.
It's really just debugging and troubleshooting, but with a different goal in mind.
In my experience most webapps can fix so much low hanging performance issues by mapping the API in a way that matches how its used in the client. It can remove so much mapping and combining for data all over.
MatthiasWandel•1h ago
jandrewrogers•1h ago