frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Link Time Optimizations: New Way to Do Compiler Optimizations

https://johnnysswlab.com/link-time-optimizations-new-way-to-do-compiler-optimizations/
39•signa11•1y ago

Comments

sakex•12mo ago
Maybe add the date to the title, because it's hardly new at this point
vsl•12mo ago
...or in 2020 (the year of the article).
Deukhoofd•12mo ago
What do you mean, new? LTO has been in GCC since 2011. It's old enough to have a social media account in most jurisdictions.
jeffbee•12mo ago
Pretty sure MSVC ".NET" was doing link-time whole-program optimization in 2001.
andyayers•12mo ago
HPUX compilers were doing this back in 1993.
jeffbee•12mo ago
Oh yeah, well ... actually I got nothin'. You win.

I will just throw in some nostalgia for how good that compiler was. My college roommate brought an HP pizza box that his dad secured from HP, and the way the C compiler quoted chapter and verse from ISO C in its error messages was impressive.

abainbridge•12mo ago
Or academics in 1986: https://dl.acm.org/doi/abs/10.1145/13310.13338

The idea of optimizations running at different stages in the build, with different visibility of the whole program, was discussed in 1979, but the world was so different back then that the discussion seems foreign. https://dl.acm.org/doi/pdf/10.1145/872732.806974

srean•12mo ago
Yes and if I remember correctly there used to be Linux distros that had all the distro binaries LTO'ed.
phkahler•12mo ago
I tried LTO with Solvespace 4 years ago and got about 15 percent better performance:

https://github.com/solvespace/solvespace/issues/972

Build time was terrible taking a few minutes vs 30-40 seconds for a full build. Have they done anything to use multi-core for LTO? It only used one core for that.

Also tested OpenMP which was obviously a bigger win. More recently I ran the same test after upgrading from an AMD 2400G to a 5700G which has double the cores and about 1.5x the IPC. The result was a solid 3x improvement so we scale well with cores going from 4 to 8.

wahern•12mo ago
Both clang and GCC support multi-core LTO, as does Rust. However, you have to partition the code, so the more cores you use the less benefit to LTO. Rust partitions by crate by default, but it can increase parallelism by partitioning each crate. I think "fat LTO" is the term typically used for whole-program, or at least in the case of Rust, whole-crate LTO, whereas "thin LTO" is what you get when you LTO partitions and then link those together normally. For clang and GCC, you can either have them automatically partition the code for thin LTO , or do it explicitly via your Makefile rules[1].

[1] Interestingly, GCC actually invokes Make internally to implement thin LTO, which lets it play nice with GNU Make's job control and obey the -j switch.

WalterBright•12mo ago
Link time optimizations were done in the 1980s if I recall correctly.

I never tried to implement them, finding it easier and more effective for the compiler to simply compile all the source files at the same time.

The D compiler is designed to be able to build one object file per source file at a time, or one object file which combines all of the source files. Most people choose the one object file.

srean•12mo ago
I think MLton does it this way.

http://mlton.org/WholeProgramOptimization

Dynamically linked and dynamically loaded libraries are useful though (paid for with its problems of course)

tester756•12mo ago
Yea, generating many object files seems like weird thing. Maybe it was good thing decades ago, but now?

Because then you need to link them, thus you need some kind of linker.

Just generate one output file and skip the linker

WalterBright•12mo ago
I've considered many times doing just that.
tester756•12mo ago
And what was the result/conclusion of such considerations?
WalterBright•12mo ago
Not worth the effort.

1. linkers have increased enormously in complexity

2. little commonality between linkers for different platforms

3. compatibility with the standalone linkers

4. trying to keep up with constant enhancement of existing linkers

yencabulator•12mo ago
Not maybe. Sufficient RAM for compilation was a serious issue back in the day.
kazinator•12mo ago
Sure, and if any file is touched, just process them all.
adrian_b•12mo ago
Some compilers had incremental compilation to handle this during development builds.

Then only the functions touched inside some file would be recompiled, not the remainder of the file or other files.

Obviously, choosing incremental compilation inhibited some optimizations.

adrian_b•12mo ago
Generating many object files is pointless for building an executable or a dynamic library, but it remains the desired behavior for building a static library.

Many software projects that must generate multiple executables are better structured as a static library plus one source file with the "main" function for each executable.

WalterBright•12mo ago
One thing the D compiler does is it can generate a library in one step (no need to use the librarian). Give a bunch of source files and object files on the command line, specify a library as the output, and boom! library created directly (compiling the source files, and adding the object files).

I haven't used a librarian program for maybe a decade.

senkora•12mo ago
In C++, there is a trick to get this behavior called "unity builds", where you include all of your source files into a single file and then invoke the compiler on that file.

Of course, being C++, this subtly changes behavior and must be done carefully. I like this article that explains the ins and outs of using unity builds: https://austinmorlan.com/posts/unity_jumbo_build/

WalterBright•12mo ago
> this subtly changes behavior

The D module design ensures that module imports are independent of each other and are independent of the importer.

YorickPeterse•12mo ago
For Inko (https://inko-lang.org/) I went a step further: it generates an object file for each type, instead of per source file or per project. The idea is that if e.g. a generic type is specialized into a new instance (or has some methods added to it), only the object file for that type needs to be re-generated. This in turn should allow for much more fine-grained incremental compilation.

The downside is that you can end up with thousands of object files, but for modern linkers that isn't a problem.

dooglius•12mo ago
It sounds like this would prevent the inherit concurrency you would get out of handling files separately?
WalterBright•12mo ago
It's complicated and not at all clear. For example, most modules import other modules. With separate compilation, most of the modules need to be compiled multiple times, with all-together, it's only once.

On the other hand, the optimizer and code generator can be run concurrently in multiple processes/threads.

Remnant44•12mo ago
Link time optimization is definitely not new, but it is incredibly powerful - I have personally had situations where the failure to be able to inline functions from a static library without lto cut performance in half.

It's easy to dismiss a basic article like this, but it's basically a discovery that every Junior engineer will make, and it's useful to talk about those too!

srean•12mo ago
The inline keyword should really have been intended for call sites rather than definitions.

Perhaps language designers thought that if a function needs to be inlined everywhere, it would lead to verbose code. In any case, it's a weak hint that compilers generally treat with much disdain.

lilyball•12mo ago
ffmpeg has a lot of assembly code in it, so it's a very odd choice of program to use for this kind of test as LTO is presumably not going to do anything to the assembly.
mcdeltat•12mo ago
Different .c/.cpp files being a barrier to optimisation always struck me as an oddly low bar for the 21st century. Yes I know the history of compilation units but these days that's not how we use the system. We don't split code into source files for memory reasons, we do it for organisation. On a small/medium codebase and a decent computer you could probably fit dozens of source files into memory to compile and optimise together. The memory constraint problem has largely disappeared.

So why do we still use the old way? LTO seems effectively like a hack to compensate for the fact that the compilation model doesn't fit our modern needs. Obviously this will never change in C/C++ due to momentum and backwards compatibility. But a man can dream.

kazinator•12mo ago
LTO breaks code which assumes that the compiler has no idea what is behind an external function call and must not assume anything about the values of objects that the code might have access to:

    securely_wipe_memory(&obj, sizeof obj);
    return;
  }
Compiler peeks into securely_wipe_memory and sees that it has no effect because obj is a local variable which has no "next use" in the data flow graph. Thus the call is removed.

Another example:

    gc_protect(object);
    return
  }
Here, gc_protect is an empty function. Without LTO, the compiler must assume that the value of object is required for the gc_protect call and so the generated code has to hang on to that value until that call is made. With LTO, the compiler peeks at the definition of gc_protect and sees the ruse: the function is empty! Therefore, that line of code does not represent a use of the variable. The generated code can use the register or memory location for something else long before that line. If the garbage collector goes off in that part of the code, the object is prematurely collected (if what was lost happens to be the last reference to it).

Some distros have played with turning on LTO as a default compiler option for building packages. It's a very, very bad idea.

djmips•12mo ago
So slow
jordiburgos•12mo ago
Any idea on the performance improvements with these LTO?

An OpenAI model has disproved a central conjecture in discrete geometry

https://openai.com/index/model-disproves-discrete-geometry-conjecture/
172•tedsanders•1h ago•89 comments

GitHub confirms breach of 3,800 repos via malicious VSCode extension

https://www.bleepingcomputer.com/news/security/github-confirms-breach-of-3-800-repos-via-maliciou...
101•Timofeibu•6h ago•36 comments

How fast is N tokens per second really?

https://mikeveerman.github.io/tokenspeed/
169•hexagr•2d ago•43 comments

Flipper One Tech Specs

https://docs.flipper.net/one/general/tech-specs
50•gregsadetsky•1h ago•9 comments

Qwen3.7-Max: The Agent Frontier

https://qwen.ai/blog?id=qwen3.7
504•kevinsimper•9h ago•195 comments

Why is Inkwell stuck in review

https://www.manton.org/2026/05/19/why-is-inkwell-stuck-in.html
39•speckx•2h ago•13 comments

Sharla Boehm, the programmer whose code underpins the Internet

https://www.scientificamerican.com/article/the-programmer-whose-code-underpins-the-internet/
41•dxs•2d ago•13 comments

SBCL: the ultimate assembly code breadboard (2014)

https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-code-breadboard/
87•yacin•4h ago•5 comments

Saying Goodbye to Asm.js

https://spidermonkey.dev/blog/2026/05/20/saying-goodbye-to-asmjs.html
247•eqrion•8h ago•113 comments

Incident Report: May 19, 2026 – GCP Account Suspension

https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage
305•0xedb•11h ago•186 comments

Map of Metal

https://mapofmetal.com/
343•robin_reala•9h ago•120 comments

Google's AI is being manipulated. The search giant is quietly fighting back

https://www.bbc.com/future/article/20260519-google-tackles-attempts-to-hack-its-ai-results
199•tigerlily•9h ago•149 comments

Qian Xuesen: The missile genius America lost and China gained (2025)

https://www.usni.org/magazines/naval-history/2025/december/missile-genius-america-lost-and-china-...
22•thnaks•2h ago•14 comments

Apparently Google hates us now

https://twitter.com/pokemoncentral/status/2057123807404638250
315•zeitg3ist•3h ago•151 comments

Meta blocks human rights accounts from reaching audiences in Saudi Arabia, UAE

https://www.alqst.org/ar/posts/1190
813•giuliomagnifico•7h ago•345 comments

LoRA and Weight Decay (2023)

https://irhum.github.io/blog/lorawd/
9•jxmorris12•1d ago•0 comments

Node.js 26.0.0 (Now with Temporal)

https://nodejs.org/en/blog/release/v26.0.0
48•aarestad•1h ago•11 comments

Everything in C is undefined behavior

https://blog.habets.se/2026/05/Everything-in-C-is-undefined-behavior.html
444•lycopodiopsida•13h ago•592 comments

Formal Verification Gates for AI Coding Loops

https://reubenbrooks.dev/blog/structural-backpressure-beats-smarter-agents/
68•pyrex41•4h ago•11 comments

Tracking Starbucks' 'widely recyclable' cups: none ended up at recycling

https://www.beyondplastics.org/press-releases/starbucks-cups-recyclable-report
91•theanonymousone•1h ago•69 comments

Testing distributed systems with AI agents

https://github.com/shenli/distributed-system-testing
61•shenli3514•5h ago•8 comments

Handling the great code forge fragmentation

https://www.alexselimov.com/posts/forge_fragmentation/
26•mooreds•3d ago•9 comments

Stable Audio 3

https://arxiv.org/abs/2605.17991
65•guardienaveugle•4h ago•13 comments

Tennessee man jailed 37 days for Trump meme wins settlement after lawsuit

https://www.fire.org/news/victory-tennessee-man-jailed-37-days-trump-meme-wins-835000-settlement-...
558•ceejayoz•5h ago•350 comments

When Fast Fourier Transform Meets Transformer for Image Restoration (2024)

https://github.com/deng-ai-lab/SFHformer
65•teleforce•2d ago•7 comments

Japan is gripped by mass allergies. A 1950s project is to blame

https://www.bbc.com/future/article/20260515-the-1950s-blunder-which-causes-mass-hay-fever-in-japan
318•ranit•18h ago•145 comments

Show HN: Lance – image/video generation and understanding in one model

https://github.com/bytedance/Lance
34•cleardusk•4h ago•11 comments

Show HN: Hocuspocus 4 – self-hosted Yjs collaboration backend

https://github.com/ueberdosis/hocuspocus
25•philipisik•5h ago•3 comments

Autoregressive next token prediction and KV Cache in transformers

https://medium.com/advanced-deep-learning/autoregressive-next-token-prediction-kv-cache-in-transf...
43•coarchitect•2d ago•0 comments

Smartmedia Card Spec Opened, available free (2000)

https://www.edn.com/smartmedia-card-interface-spec-opened-available-for-free/#google_vignette
25•brudgers•3d ago•14 comments