frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

You Can't Fool the Optimizer

https://xania.org/202512/03-more-adding-integers
62•HeliumHydride•1h ago•26 comments

Anthropic acquires Bun

https://bun.com/blog/bun-joins-anthropic
1935•ryanvogel•19h ago•916 comments

Mathematics is hard for mathematicians to understand too

https://www.science.org/doi/10.1126/science.aec9014
55•mmaaz•5d ago•36 comments

Zig quits GitHub, says Microsoft's AI obsession has ruined the service

https://www.theregister.com/2025/12/02/zig_quits_github_microsoft_ai_obsession/
460•Brajeshwar•5h ago•243 comments

A Look at Rust from 2012

https://purplesyringa.moe/blog/a-look-at-rust-from-2012/
23•todsacerdoti•1w ago•0 comments

IBM CEO says there is 'no way' spending on AI data centers will pay off

https://www.businessinsider.com/ibm-ceo-big-tech-ai-capex-data-center-spending-2025-12
631•nabla9•19h ago•701 comments

Interview with RollerCoaster Tycoon's Creator, Chris Sawyer (2024)

https://medium.com/atari-club/interview-with-rollercoaster-tycoons-creator-chris-sawyer-684a0efb0f13
155•areoform•8h ago•28 comments

The "Mad Men" in 4K on HBO Max Debacle

http://fxrant.blogspot.com/2025/12/the-mad-men-in-4k-on-hbo-max-debacle.html
97•tosh•1h ago•33 comments

AI agents break rules under everyday pressure

https://spectrum.ieee.org/ai-agents-safety
180•pseudolus•6d ago•84 comments

Super fast aggregations in PostgreSQL 19

https://www.cybertec-postgresql.com/en/super-fast-aggregations-in-postgresql-19/
120•jnord•1w ago•9 comments

The Writing Is on the Wall for Handwriting Recognition

https://newsletter.dancohen.org/archive/the-writing-is-on-the-wall-for-handwriting-recognition/
72•speckx•6d ago•38 comments

Paged Out

https://pagedout.institute
474•varjag•17h ago•52 comments

Researchers Find Microbe Capable of Producing Oxygen from Martian Soil

https://scienceclock.com/microbe-that-could-turn-martian-dust-into-oxygen/
42•ashishgupta2209•6h ago•21 comments

Quad9 DOH HTTP/1.1 Retirement, December 15, 2025

https://quad9.net/news/blog/doh-http-1-1-retirement/
75•pickledoyster•7h ago•20 comments

OpenAI declares 'code red' as Google catches up in AI race

https://www.theverge.com/news/836212/openai-code-red-chatgpt
709•goplayoutside•22h ago•793 comments

I designed and printed a custom nose guard to help my dog with DLE

https://snoutcover.com/billie-story
542•ragswag•3d ago•65 comments

Trying Out C++26 Executors

https://mropert.github.io/2025/11/21/trying_out_stdexec/
26•ingve•5d ago•14 comments

Learning music with Strudel

https://terryds.notion.site/Learning-Music-with-Strudel-2ac98431b24180deb890cc7de667ea92
515•terryds•1w ago•122 comments

Understanding ECDSA

https://avidthinker.github.io/2025/11/28/understanding-ecdsa/
75•avidthinker•9h ago•19 comments

Qwen3-VL can scan two-hour videos and pinpoint nearly every detail

https://the-decoder.com/qwen3-vl-can-scan-two-hour-videos-and-pinpoint-nearly-every-detail/
218•thm•3d ago•65 comments

What, if anything, is universal to music cognition? (2024)

https://www.nature.com/articles/s41562-023-01800-9
26•Hooke•1w ago•16 comments

Zig's new plan for asynchronous programs

https://lwn.net/SubscriberLink/1046084/4c048ee008e1c70e/
306•messe•22h ago•217 comments

Counter Galois Onion: Improved encryption for Tor circuit traffic

https://blog.torproject.org/introducing-cgo/
86•wrayjustin•1w ago•28 comments

India scraps order to pre-install state-run cyber safety app on smartphones

https://www.bbc.com/news/articles/clydg2re4d1o
30•wolpoli•2h ago•4 comments

Amazon launches Trainium3

https://techcrunch.com/2025/12/02/amazon-releases-an-impressive-new-ai-chip-and-teases-a-nvidia-f...
182•thnaks•18h ago•65 comments

All about automotive lidar

https://mainstreetautonomy.com/blog/2025-08-29-all-about-automotive-lidar/
173•dllu•1d ago•67 comments

Sending DMARC reports is somewhat hazardous

https://utcc.utoronto.ca/~cks/space/blog/spam/DMARCSendingReportsProblems
51•zdw•8h ago•21 comments

School cell phone bans and student achievement

https://www.nber.org/digest/202512/school-cell-phone-bans-and-student-achievement
179•harias•19h ago•165 comments

DOOM could have had PC Speaker Music

https://lenowo.org/viewtopic.php?t=45
105•minki_the_avali•14h ago•70 comments

Load ZX Spectrum – first Museum dedicated to our first personal computer

https://loadzx.com/en/
62•elvis70•6d ago•28 comments
Open in hackernews

You Can't Fool the Optimizer

https://xania.org/202512/03-more-adding-integers
60•HeliumHydride•1h ago

Comments

jagged-chisel•52m ago
I always code with the mindset “the compiler is smarter than me.” No need to twist my logic around attempting to squeeze performance out of the processor - write something understandable to humans, let the computer do what computers do.
qsort•42m ago
> I always code with the mindset “the compiler is smarter than me.”

Like with people in general, it depends on what compiler/interpreter we're talking about, I'll freely grant that clang is smarter than me, but CPython for sure isn't. :)

More generally, canonicalization goes very far, but no farther than language semantics allows. Not even the notorious "sufficiently smart compiler" with infinite time can figure out what you don't tell it.

manbitesdog•3m ago
To add to this, the low-level constraints also make this assumption noisy, no matter how smart the compiler is. On the CPython case, if you do `dis.dis('DAY = 24 * 60 * 60)` you will see that constant folding nicely converts it to `LOAD_CONST 86400`. However, if you try `dis.dis('ATOMS_IN_THE_WORLD = 10*50')` you will get LOAD_CONST 10, LOAD_CONST 50, BINARY_OP *.
adrianN•39m ago
This is decent advice in general, but it pays off to try and express your logic in a way that is machine friendly. That mostly means thinking carefully about how you organize the data you work with. Optimizers generally don't change data structures or memory layout but that can make orders of magnitude difference in the performance of your program. It is also often difficult to refactor later.
lou1306•13m ago
To make a more specific example, if you malloc()/free() within a loop, it's unlikely that the compiler will fix that for you. However, moving those calls outside of the loop (plus maybe add some realloc()s within, only if needed) is probably going to perform better.
amiga386•8m ago
I find the same too. I find gcc and clang can inline functions, but can't decide to break apart a struct used only among those inlined functions and make every struct member a local variable, and then decide that one or more of those local variables should be allocated as a register for the full lifetime of the function, rather than spill onto the local stack.

So if you use a messy solution where something that should be a struct and operated on with functions, is actually just a pile of local variables within a single function, and you use macros operating on local variables instead of inlineable functions operating on structs, you get massively better performance.

tonyhart7•38m ago
also not all software need optimization to the bone

pareto principle like always, dont need the best but good enough

not every company is google level anyway

ErroneousBosh•27m ago
You say that, but I was able to reduce the code size of some avr8 stuff I was working on by removing a whole bunch of instructions that zero out registers and then shift a value around. I don't it to literally shift the top byte 24 bits to the right and zero out the upper 24 bits, I just need it to pass the value in the top 8 bits direct to the next operation.

I agree that most people are not writing hand-tuned avr8 assembly. Most people aren't attempting to do DSP on 8-bit AVRs either.

IshKebab•22m ago
The fact that compilers are smart isn't an excuse to not think about performance at all. They can't change your program architecture, algorithms, memory access patterns, etc.

You can mostly not think about super low level integer manipulation stuff though.

jaccola•19m ago
I would take it one step further, often trying to eke out performance gains with clever tricks can hurt performance by causing you to "miss the forest for the trees".

I work with Cuda kernels a lot for computer vision. I am able to consistently and significantly improve on the performance of research code without any fancy tricks, just with good software engineering practices.

By organising variables into structs, improving naming, using helper functions, etc... the previously impenetrable code becomes so much clearer and the obvious optimisations reveal themselves.

Not to say there are certain tricks / patterns / gotchas / low level hardware realities to keep in mind, of course.

toonewbie•51m ago
Sometimes you can fool the compiler :-)

See "Example 2: Tricking the compiler" in my blog post about O3 sometimes being slower than O2: https://barish.me/blog/cpp-o3-slower/

317070•44m ago
"The compiler" and "The optimizer" are doing a lot of the heavy lifting here in the argument. I definitely know compilers and optimizers which are not that great. Then again, they are not turning C++ code into ARM instructions.

You absolutely can fool a lot of compilers out there! And I am not only looking at you, NVCC.

Almondsetat•28m ago
But the point should be to follow the optimization cycle: develop, benchmark, evaluate, profile, analyze, optimize. Writing performant code is no joke and very often destroys readability and introduces subtle bugs, so before trying to oursmart the compiler, evaluate if what it produces is good enough already
amelius•40m ago
One undesirable property of optimizers is that in theory one day they produce good code and the next day they don't.
sureglymop•39m ago
With this one I instead wondered: If there are 4 functions doing exactly the same thing, couldn't the compiler also only generate the code for one of them?

E.g. if in `main` you called two different add functions, couldn't it optimize one of them away completely?

It probably shouldn't do that if you create a dynamic library that needs a symbol table but for an ELF binary it could, no? Why doesn't it do that?

cyco130•28m ago
It would but it's harder to trigger. Here, it's not safe because they're public functions and the standard would require `add_v1 != add_v2` (I think).

If you declare them as static, it eliminates the functions and the calls completely: https://aoco.compiler-explorer.com/z/soPqe7eYx

I'm sure it could also perform definition merging like you suggest but I can't think of a way of triggering it at the moment without also triggering their complete elision.

moefh•22m ago
> It probably shouldn't do that if you create a dynamic library that needs a symbol table but for an ELF binary it could, no?

It can't do that because the program might load a dynamic library that depends on the function (it's perfectly OK for a `.so` to depend on a function from the main executable, for example).

That's one of the reasons why a very cheap optimization is to always use `static` for functions when you can. You're telling the compiler that the function doesn't need to be visible outside the current compilation unit, so the compiler is free to even inline it completely and never produce an actual callable function, if appropriate.

bruce343434•17m ago
Sadly most C++ projects are organized in a way that hampers static functions. To achieve incremental builds, stuff is split into separate source files that are compiled and optimized separately, and only at the final step linked, which requires symbols of course.

I get it though, because carefully structuring your #includes to get a single translation unit is messy, and compile times get too long.

cyco130•9m ago
That’s where link-time optimization enters the picture. It’s expensive but tolerable for production builds of small projects and feasible for mid-sized ones.
apple1417•18m ago
The MSVC linker has a feature where it will merge byte-for-byte identical functions. It's most noticeable for default constructors, you might get hundreds of functions which all boil down to "zero the first 32 bytes of this type".

A quick google suggests it's called "identical comdat folding" https://devblogs.microsoft.com/oldnewthing/20161024-00/?p=94...

daft_pink•39m ago
Is this an argument for compiled code?
0xTJ•29m ago
It's not really an argument for anything, it's just showing off how cool compilers are!
mkornaukhov•19m ago
Better tell me how to make the compiler not fool me!
Scene_Cast2•16m ago
This post assumes C/C++ style business logic code.

Anything HPC will benefit from thinking about how things map onto hardware (or, in case of SQL, onto data structures).

I think way too few people use profilers. If your code is slow, profiling is the first tool you should reach for. Unfortunately, the state of profiling tools outside of NSight and Visual Studio (non-Code) is pretty disappointing.

asah•16m ago
I want an AI optimization helper that recognizes patterns that could-almost be optimized if I gave it a little help, e.g. hints about usage, type, etc.
stabbles•14m ago
For people who enjoy these blogs, you would definitely like the Julia REPL as well. I used to play with this a lot to discover compiler things.

For example:

    $ julia
    julia> function f(n)
             total = 0
             for x in 1:n
               total += x
             end
             return total
           end
    julia> @code_native f(10)
        ...
        sub    x9, x0, #2
        mul    x10, x8, x9
        umulh    x8, x8, x9
        extr    x8, x8, x10, #1
        add    x8, x8, x0, lsl #1
        sub    x0, x8, #1
        ret
        ...
it shows this with nice colors right in the REPL.

In the example above, you see that LLVM figured out the arithmetic series and replaced the loop with a simple multiplication.