frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Both GCC and Clang generate strange/inefficient code

https://codingmarginalia.blogspot.com/2026/02/both-gcc-and-clang-generate.html
38•rsf•4d ago

Comments

the_fall•4d ago
It's common for compilers to generate mildly unusual code because they translate high-level code into an abstract intermediate notation, run a variety optimization steps on that notation, and then emit machine-specific code to perform whatever the optimizations yielded. There's no constraint along the lines of "but select the most logical opcode for this task".

The claim that the code is inefficient is really not substantiated well in this blog post. Sometimes, long-winded assembly actually runs faster because of pipelining, register aliasing, and other quirks. Other times, a "weird" way of zeroing a register may actually take up less space in memory, etc.

rsf•4d ago
> The claim that the code is inefficient is really not substantiated well in this blog post.

I didn't run benchmarks, but in the case of clang writing zeros to memory (which are never used thereafter), there's no way that particular code is optimal.

For the gcc output, it seems unlikely that the three versions are all optimal, given the inconsistent strategies used. In particular, the code that sets the output value to 0 or 1 in the size = 3 version is highly unlikely to be optimal in my opinion. I'd be amazed if it is!

Your point that unintuitive code is sometimes actually optimal is well taken though :)

its_magic•4d ago
Stefan Kanthak has previously noted that GCC's code generator is quite horrible, in these extensive investigations:

https://skanthak.hier-im-netz.de/gcc.html

DannyBee•1h ago
It's also rarely worth being optimal in scalar code anymore, particularly at compilation speed cost. The exception here is memory accesses and branches that will miss. So the writing of useless zeros is egregious but other stuff just isn't usually worth caring about these days. It's "good enough" in an age where even in embedded land I can run a 48mhz cortex m0 for 10 years on a battery and not worry about a few extra ANDS. I'm much more likely to hit size than speed limitations.

Not to mention for anything not super battery limited you can get a m55 running at 800mhz with a separate 1ghz npu, hardware video encoders, etc.

This is before you move into the rockchip/etc space.

We really just aren't scalar compute limited in tons of places these days. There are certainly places but 10-15 years ago missing little scalar optimizations could make very noticeable differences in the performance of lots of apps and now it just doesn't anymore

magicalhippo•57m ago
> The claim that the code is inefficient is really not substantiated well in this blog post. Sometimes, long-winded assembly actually runs faster because of pipelining, register aliasing, and other quirks.

I had a case back in the 2010s where I was trying to optimize a hot loop. The loop involved an integer division by a factor which was common for all elements, similar to a vector normalization pass. For reasons I don't recall, I couldn't get rid of the division entirely.

I saw the compiler emitted an "idiv [mem]" instruction, and I thought surely that was suboptimal. So I reproduced the assembly but changed the code slightly so I could have "idiv reg" instead. All it involved was loading the variable into an unused register before the loop and use that inside the loop.

So I benchmarked it and much to my surprise it was a fair bit slower.

I thought I might have been due to loop target alignment, so I spent some time inserting no-ops to align things in various supposedly optimal ways, but it never got as fast. I changed my assembly to mirror what the compiler had spit out and voila, back to the fastest speed again...

Tried to ask around, and someone suggested it had to do with some internal register load/store contention or something along those lines.

At that point I knew I was done optimizing code by writing assembly. Not my cup of tea.

btdmaster•4d ago
In my experience C++ abstractions give the optimizer a harder job and thus it generates worse code. In this case, different code is emitted by clang if you write a C version[0] versus C++ original[1].

Usually abstraction like this means that the compiler has to emit generic code which is then harder to flow through constraints and emit the same final assembly since it's less similar to the "canonical" version of the code that wouldn't use a magic `==` (in this case) or std::vector methods or something else like that.

[0] https://godbolt.org/z/vso7xbh61

[1] https://godbolt.org/z/MjcEKd9Tr

pjmlp•3d ago
Except that the C++ version doesn't need to be like that.

Abstractions are welcome when it doesn't matter, when it matters there are other ways to write the code and it keeps being C++ compliant.

maccard•1h ago
To back up the other commenter - it's not the same. https://godbolt.org/z/r6e443x1c shows that if you write imperfect C++ clang is perfectly capable of optimizing it.
cogman10•39m ago
What's strange is I'm finding that gcc really struggles to correctly optimize this.

This was my function

    for (auto v : array) {
        if (v != 0)
            return false;
    }
    return true;
clang emits basically the same thing yours does. But gcc ends up just really struggling to vectorize for large numbers of array.

Here's gcc for 42 elements:

https://godbolt.org/z/sjz7xd8Gs

and here's clang for 42 elements:

https://godbolt.org/z/frvbhrnEK

Very bizarre. Clang pretty readily sees that it can use SIMD instructions and really optimizes this while GCC really struggles to want to use it. I've even seen strange output where GCC will emit SIMD instructions for the first loop and then falls back on regular x86 compares for the rest.

Edit: Actually, it looks like for large enough array sizes, it flips. At 256 elements, gcc ends up emitting simd instructions while clang does pure x86. So strange.

rwmj•1h ago
The OP should try with -march=native so the compiler can use vector instructions.

Slightly off-topic but I like this way to test if memory is all zeroes: https://rusty.ozlabs.org/2015/10/20/ccanmems-memeqzero-itera... (see "epiphany #2" at the bottom of the page) I really wish there was a standard libc function for it.

gspr•57m ago
With `u32` as the element type, rustc 1.93 (with `-O`) does the correct thing for size=1, checks both elements separately (i.e. worse than in the article) for size=2, checks all three elements separately (i.e. not being crazy like in the article) for size=3, and starts doing SIMD at size=4.

https://godbolt.org/z/5PETM5bbn

usamoi•26m ago
This code is not equivalent to the C++ version. You can directly use `*x == [0_u32; SIZE]`. The code generated by the two is different. (But the iterator version not producing optimal code is also an issue.)
newpavlov•38m ago
Compilers also like to unnecessarily copy data to stack: https://github.com/llvm/llvm-project/issues/53348 Which can be particularly annoying in cryptographic code where you want to minimize number of copies of sensitive data.

GLM5 Released on Z.ai Platform

https://chat.z.ai/
52•CuriouslyC•55m ago•25 comments

It's all a blur

https://lcamtuf.substack.com/p/its-all-a-blur
153•zdw•5d ago•24 comments

Windows Notepad App Remote Code Execution Vulnerability

https://www.cve.org/CVERecord?id=CVE-2026-20841
488•riffraff•8h ago•290 comments

Do not apologize for replying late to my email

https://ploum.net/2026-02-11-do_not_apologize_for_replying_to_my_email.html
59•validatori•3h ago•45 comments

Show HN: AI agents play SimCity through a REST API

https://hallucinatingsplines.com
40•aed•1d ago•7 comments

Exposure Simulator

http://www.andersenimages.com/tutorials/exposure-simulator/
55•sneela•3h ago•17 comments

Chrome extensions spying on users' browsing data

https://qcontinuum.substack.com/p/spying-chrome-extensions-287-extensions-495
280•qcontinuum1•4h ago•102 comments

Rome is studded with cannon balls (2022)

https://essenceofrome.com/rome-is-studded-with-cannon-balls
19•thomassmith65•4d ago•0 comments

A Cosmic Miracle: A Remarkably Luminous Galaxy at z=14.44 Confirmed with JWST

https://astro.theoj.org/article/156033-a-cosmic-miracle-a-remarkably-luminous-galaxy-at-_z_-sub-s...
57•yread•5h ago•26 comments

Who Smeared Feynman

https://blog.nuclearsecrecy.com/2014/07/11/smeared-richard-feynman/
7•srean•41m ago•3 comments

Communities are not fungible

https://www.joanwestenberg.com/communities-are-not-fungible/
69•tardibear•6h ago•46 comments

Show HN: Itsyhome – Control HomeKit from your Mac menu bar (open source)

https://itsyhome.app
40•nixus76•16h ago•38 comments

The Feynman Lectures on Physics (1961-1964)

https://www.feynmanlectures.caltech.edu/
382•rramadass•1d ago•102 comments

The Singularity will occur on a Tuesday

https://campedersen.com/singularity
1215•ecto•21h ago•659 comments

Visualize MySQL query execution plans as interactive FlameGraphs

https://github.com/vgrippa/myflames
26•tanelpoder•4d ago•4 comments

End of an era for me: no more self-hosted git

https://www.kraxel.org/blog/2026/01/thank-you-ai/
104•dzulp0d•12h ago•63 comments

Show HN: Musical Interval Trainer

https://valtterimaja.github.io/musical-interval-trainer/
6•Gravityloss•2h ago•3 comments

Ex-GitHub CEO launches a new developer platform for AI agents

https://entire.io/blog/hello-entire-world/
552•meetpateltech•22h ago•512 comments

Exploring a Modern SMTPE 2110 Broadcast Truck

https://www.jeffgeerling.com/blog/2026/exploring-a-modern-smpte-2110-broadcast-truck-with-my-dad/
123•assimpleaspossi•3d ago•23 comments

CoLoop (YC S21) Is Hiring Ex Technical Founders in London

https://www.workatastartup.com/jobs/90016
1•mrlowlevel•7h ago

FAA closes airspace around El Paso, Texas, for 10 days, grounding all flights

https://apnews.com/article/faa-el-paso-texas-air-space-closed-1f774bdfd46f5986ff0e7003df709caa
301•EwanG•3h ago•220 comments

Show HN: CodeMic

https://codemic.io/#hn
36•seansh•3d ago•17 comments

Both GCC and Clang generate strange/inefficient code

https://codingmarginalia.blogspot.com/2026/02/both-gcc-and-clang-generate.html
38•rsf•4d ago•13 comments

Clean-room implementation of Half-Life 2 on the Quake 1 engine

https://code.idtech.space/fn/hl2
405•klaussilveira•1d ago•83 comments

The Day the Telnet Died

https://www.labs.greynoise.io/grimoire/2026-02-10-telnet-falls-silent/
402•pjf•16h ago•289 comments

Fun With Pinball

https://www.funwithpinball.com/exhibits/small-boards
123•jackwilsdon•14h ago•9 comments

The Little Learner: A Straight Line to Deep Learning (2023)

https://mitpress.mit.edu/9780262546379/the-little-learner/
182•AlexeyBrin•3d ago•21 comments

My eighth year as a bootstrapped founder

https://mtlynch.io/bootstrapped-founder-year-8/
275•mtlynch•3d ago•79 comments

Signy: Signed URLs for Small Devices

https://github.com/golioth/signy
54•hasheddan•5d ago•2 comments

Simplifying Vulkan one subsystem at a time

https://www.khronos.org/blog/simplifying-vulkan-one-subsystem-at-a-time
265•amazari•1d ago•178 comments