frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

When Compiler Optimizations Hurt Performance

https://nemanjatrifunovic.substack.com/p/when-compiler-optimizations-hurt
27•rbanffy•6d ago

Comments

userbinator•17m ago
ARM64 has many registers but I believe the lack of suitably large immediate values and, apparently, compilers that are willing to use them all across functions, puts it at a disadvantage here. Assuming we want the return value in eax and the leading count comes in cl, this can be done branchlessly and data-lessly on x86 as follows:

    mov eax, 0x00043201
    test cl, 8
    setz al
    shl cl, 2
    shr eax, cl
    and eax, 15
Something similar may be possible on ARM64, but I suspect it will definitely be more than 19 bytes ;-)
jandrewrogers•58s ago
I’ve seen many examples of a similar phenomenon. Modern compilers are impressively good at generating bit-twiddling micro-optimizations from idiomatic code. They are also good at larger scale structural macro-optimization. However, there is a No Man’s Land for compiler optimization between micro-optimization and macro-optimization where the effectiveness of compiler optimizations are much less reliable.

Intuitively I understand why it is a hard problem. Micro-optimizations have deterministic properties that are simple enough that optimality is all but certain. Macro-optimization heuristics create incidental minor pessimization effects that are buried below the noise floor by major optimization effects on average.

In the middle are optimizations that are too complex to guarantee effective optimization but too small in effect to offset any incidental pessimization. It is easy to inadvertently make things worse in these cases. Most cases of surprising performance variances I see fall into this gap.

It is also where the codegen from different compilers seems to disagree the most, which lends evidence to the idea that the “correct” codegen is far from obvious to a compiler.

AWS multiple services outage in us-east-1

https://health.aws.amazon.com/health/status?ts=20251020
1859•kondro•23h ago•1868 comments

Practical Scheme

https://practical-scheme.net/index.html#docs
14•ufko_org•1h ago•6 comments

A laser pointer at 2B FPS [video]

https://www.youtube.com/watch?v=o4TdHrMi6do
362•thunderbong•2d ago•70 comments

When Compiler Optimizations Hurt Performance

https://nemanjatrifunovic.substack.com/p/when-compiler-optimizations-hurt
27•rbanffy•6d ago•2 comments

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system

https://www.tomshardware.com/tech-industry/semiconductors/alibaba-says-new-pooling-system-cut-nvi...
403•hd4•18h ago•254 comments

Production RAG: what I learned from processing 5M+ documents

https://blog.abdellatif.io/production-rag-processing-5m-documents
366•tifa2up•15h ago•88 comments

60k kids have avoided peanut allergies due to 2015 advice, study finds

https://www.cbsnews.com/news/peanut-allergies-60000-kids-avoided-2015-advice/
107•zdw•3h ago•71 comments

Show HN: I'm making a detective game built on Wikipedia

https://detective.wiki/
48•jasonsmiles•3d ago•10 comments

My trick for getting consistent classification from LLMs

https://verdik.substack.com/p/how-to-get-consistent-classification
166•frenchmajesty•1w ago•36 comments

BERT is just a single text diffusion step

https://nathan.rs/posts/roberta-diffusion/
382•nathan-barry•16h ago•93 comments

Claude Code on the web

https://www.anthropic.com/news/claude-code-on-the-web
445•adocomplete•12h ago•277 comments

Show HN: I created a cross-platform GUI for the JJ VCS (Git compatible)

https://judojj.com
102•bitpatch•15h ago•20 comments

I made a small LED panel

https://www.stavros.io/posts/really-small-led-panel/
53•Brajeshwar•1w ago•11 comments

Today is when the Amazon brain drain sent AWS down the spout

https://www.theregister.com/2025/10/20/aws_outage_amazon_brain_drain_corey_quinn/
553•raw_anon_1111•10h ago•243 comments

The scariest "user support" email I've received

https://www.devas.life/the-scariest-user-support-email-ive-ever-received/
231•hervic•5d ago•159 comments

ChkTag: x86 Memory Safety

https://community.intel.com/t5/Blogs/Tech-Innovation/open-intel/ChkTag-x86-Memory-Safety/post/172...
234•ashvardanian•6d ago•112 comments

Results from blood test for 50 cancers

https://www.bbc.com/news/articles/c205g21n1zzo
85•dabinat•3d ago•47 comments

A magnetic field orientation that changes the fundamental design of motors

https://www.paranetics.com/copy-of-home
36•dillonshook•5d ago•6 comments

x86-64 Playground – An online assembly editor and GDB-like debugger

https://x64.halb.it/
134•modinfo•13h ago•11 comments

How to stop Linux threads cleanly

https://mazzo.li/posts/stopping-linux-threads.html
197•signa11•5d ago•70 comments

Old Computer Challenge – Modern Web for the ZX Spectrum

https://0x00.cl/blog/2025/occ-2025/
33•0x00cl•7h ago•5 comments

Optical diffraction patterns made with a MOPA laser engraving machine [video]

https://www.youtube.com/watch?v=RsGHr7dXLuI
131•emsign•1w ago•24 comments

Space Elevator

https://neal.fun/space-elevator/
1559•kaonwarb•1d ago•363 comments

TernFS – an exabyte scale, multi-region distributed filesystem

https://www.xtxmarkets.com/tech/2025-ternfs/#posix-shaped
113•kirlev•13h ago•18 comments

Code from MIT's 1986 SICP video lectures

https://github.com/felipap/sicp-code
103•felipap•3d ago•14 comments

The longest baseball game took 33 innings to win

https://www.mlb.com/news/the-longest-professional-baseball-game-ever-played
63•mooreds•5d ago•64 comments

DeepSeek OCR

https://github.com/deepseek-ai/DeepSeek-OCR
902•pierre•1d ago•224 comments

Postman which I thought worked locally on my computer, is down

https://status.postman.com
412•helloguillecl•15h ago•194 comments

Servo v0.0.1

https://github.com/servo/servo
516•undeveloper•18h ago•162 comments

Show HN: Playwright Skill for Claude Code – Less context than playwright-MCP

https://github.com/lackeyjb/playwright-skill
151•syntax-sherlock•19h ago•41 comments