frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

An optimizing compiler doesn't help much with long instruction dependencies

https://johnnysswlab.com/an-optimizing-compiler-doesnt-help-much-with-long-instruction-dependencies/
32•ingve•1d ago

Comments

solarexplorer•1d ago
This is not a good article and the content doesn't support the claim in the title. It talks about memory latency and how it negatively affects instruction level parallelism, but doesn't offer any solution or advice, except for offering their own (payed) service...
adrian_b•1d ago
Memory latency only matters in chains of dependent instructions.

Otherwise the performance is limited by the memory transfer throughput, not by the latency of individual memory accesses.

The article demonstrates the difference between these 2 cases, even if its title could have been better.

Because the latency of memory loads is many times greater than the latency of any other kind of CPU instructions, both for loads from the main memory and for loads from the L3 cache memory, this effect is more visible in programs with many memory loads, like the examples from the article, than in programs using other instructions with long latencies.

jjtheblunt•1d ago
Aren't you overlooking memory latency mattering in mmap (MMU) page miss contexts?
adrian_b•14h ago
A page miss in the TLB cache memory that happens for a memory load is just a memory load that happens to have a latency many times greater than its normal latency, which is already very big.

The same as for normal memory loads, the effect of a page miss will vary depending on whether the memory load is part of a long dependency chain, so the CPU will not be able to find other instructions to execute concurrently while the dependency chain is stalled by waiting for the load result, or the memory load has only few instructions depending on it, so the CPU will go ahead executing other parts of the program.

Page misses in the TLB do not cause any new behavior, but the very long latencies corresponding to them exacerbate the effects of long dependency chains. With page misses, even a relatively short dependency chain may not allow the CPU to find enough independent instructions to be executed in order to avoid an execution stall.

With certain operating systems that choose to load lazily memory pages from a SSD/HDD or which choose to implement a virtual memory capacity greater than the physical memory capacity, there is a different kind of page miss, a miss from the memory currently mapped as valid by the OS, which results in an exception handled by the operating system, while the executing program is suspended. There are also mostly obsolete CPUs where a TLB page miss causes an exception, instead of being handled by dedicated hardware. In these cases, to which I assume that you refer by mentioning mmap, it does not matter whether the exception-causing instruction was part of a long dependency chain or not, the slowing-down of the program by exception handling is the same.

dahart•1d ago
Even though the example is contrived, and hopefully not too many people are doing massive reductions using a linked list of random pointers, it would still be nice to offer some suggestion on what alternatives there are. Maybe it’s faster to collect all the pointers into an array and use the first loop? If ‘list’ entries are consecutive in memory, you can ignore the list order and consume them in memory order. Collecting and sorting the pointers might improve the cache hit rates, especially if the values are dense in memory. For anything performance sensitive, avoiding linked lists, especially non-intrusive linked lists, is often a good idea, right?

What’s with the “if (idx == NULLPTR)” block? The loop won’t access an entry outside the list, so this appears to be adding unnecessary instructions and unnecessary divergence. (And maybe even unnecessary dependencies?) Does demonstrating the performance problem depend on having this code in the loop? I hope not, but I’m very curious why it’s there.

A couple of other tiny nits - the first 2 graphs should have a Y axis that starts at zero! That won’t compromise these in any way. There should be a very compelling reason not to show ratios on a graph that start from zero, and these don’t have any such reason. And I’m curious why the X axis is factors of 8 except the last two, which seem strangely arbitrary?

MatthiasWandel•20h ago
The bottleneck with the pointer table may be the summation. While the fetches of elements can be parallelized, the summation can not, as the addition depends on the result of the previous addition being available.

Some experiments I have done with something that does summation showed a considerable speedup by summing odd and even values into separate bins. Although this applies only to doing something not too closely resembling signal processing algorithms, as the compiler can otherwise optimize out for that.

Part of my video titled "new computers don't speed up old code"

Demo of kons-9 Common Lisp 3D graphics system

https://old.reddit.com/r/lisp/comments/1kxvfmc/demo_of_kons9_common_lisp_3d_graphics_system/
1•kaveh808•1m ago•0 comments

Dev snapshot: Godot 4.5 dev 5

https://godotengine.org/article/dev-snapshot-godot-4-5-dev-5/
1•kelseyfrog•2m ago•0 comments

Japanese Scientists Develop Artificial Blood Compatible with All Blood Types

https://www.tokyoweekender.com/entertainment/tech-trends/japanese-scientists-develop-artificial-blood/
1•Geekette•2m ago•1 comments

The Oracle of Lexiconia – A Fantasy That Explains How LLMs Work

https://medium.com/@isranimohit/the-oracle-of-lexiconia-a-fantasy-story-that-teaches-you-how-ai-understands-language-0c063f836057
1•isranimohit•4m ago•1 comments

Forge – an advanced 3D Gaussian Splatting renderer for Three.js

https://forge.dev/
1•Tycho87•4m ago•0 comments

Street Fighter 2 composer Yoko Shimomura has created a new track for SF6

https://www.videogameschronicle.com/news/street-fighter-2-composer-yoko-shimomura-has-created-a-new-song-for-sf6-returning-to-the-series-after-30-years/
1•mikhael•7m ago•0 comments

Tests should not contain logic

https://blog.snork.dev/posts/tests-should-not-contain-logic.html
1•todsacerdoti•11m ago•0 comments

Tech-bro satire Mountainhead is an insufferable disappointment

https://www.theguardian.com/tv-and-radio/2025/jun/02/mountainhead-tech-bro-satire-disappointment
2•labrador•12m ago•1 comments

Shop Talk Show episode 667

https://webkit.org/blog/16983/shop-talk-show-episode-667/
1•feross•13m ago•0 comments

How to Find a Good Available .COM Domain

https://sive.rs/com
1•jamesgill•18m ago•0 comments

Reverse Engineering Apple's Proprietary NFC Wallet Protocol (2024)

https://gosecure.ai/blog/2024/10/07/reverse-engineering-apple-nfc-wallet-protocol/
1•greyface-•20m ago•0 comments

T1000-E Card Tracker is a thin, credit card-sized GPS with Meshtastic support

https://www.cnx-software.com/2024/09/02/t1000-e-card-tracker-is-a-thin-credit-card-sized-gps-tracker-with-meshtastic-support/
3•janandonly•21m ago•0 comments

Obsidian Smart Composer Plugin

https://github.com/glowingjade/obsidian-smart-composer
1•consumer451•22m ago•0 comments

Gen Z parents don't like reading to their kids

https://www.theguardian.com/lifeandstyle/2025/jun/02/gen-z-parents-reading-kids
1•hbartab•26m ago•1 comments

Hate filling forms – Built an AI that just does that in one click

https://chromewebstore.google.com/detail/ai-form-filler/hnncooienpgelcbhfhoamjglkmhegdmj
1•dheerajmp•28m ago•1 comments

Open Sourced NeurIPS 2025 Position Papers

https://zenodo.org/records/15514317
1•davidkimai•29m ago•0 comments

Everything Is a Prompt

https://www.jlchnc.com/n/prompts
1•pizzuh•29m ago•0 comments

Corpdle – Wordle for S&P 500 companies

https://corpdle.com
1•jasoncartwright•30m ago•0 comments

Iron Pillar of Delhi

https://en.wikipedia.org/wiki/Iron_pillar_of_Delhi
1•Jimmc414•31m ago•0 comments

Making computers multiply FASTER (matrix hacking) [video]

https://www.youtube.com/watch?v=xsZk3c7Oxyw
2•surprisetalk•33m ago•0 comments

An unfiltered conversation with Dwarkesh Patel [video]

https://www.youtube.com/watch?v=6y-VEycAjsE
1•consumer451•36m ago•0 comments

My AI Skeptic Friends Are All Nuts

https://fly.io/blog/youre-all-nuts/
160•tabletcorry•37m ago•159 comments

Show HN: AI makes/answers calls through your own mobile phone number [video]

https://www.youtube.com/watch?v=_JAoT6JmLew
1•smandava•37m ago•0 comments

AI stirs up the optimal recipe for sustainable concrete

https://techxplore.com/news/2025-06-ai-optimal-recipe-sustainable-concrete.html
1•mdp2021•38m ago•1 comments

From Military Brat to Tech Entrepreneur

https://www.causeofakind.com/strictly-from-nowhere/don-mackinnon-co-founder-and-cto-searchcraft
1•mooreds•41m ago•0 comments

3D printed models help blind and low-vision students learn about their world

https://www.abc.net.au/news/2024-12-03/school-for-vision-impaired-3d-models-finger-glance/104643348
1•jonah•41m ago•0 comments

India and Pakistan's Air Battle Is Over. Their Water War Has Begun

https://www.nytimes.com/2025/05/31/world/asia/india-pakistan-indus-water-dispute.html
2•mooreds•46m ago•0 comments

Pattern Matching 20 Habits of Exceptional Startups

https://tylerhogge.com/2025/05/29/pattern-matching-20-habits-of-exceptional-startups/
2•jamietanna•49m ago•0 comments

Large-Scale Research with Historical Newspapers: A Turning Point Through Gen AI

https://dhlab.hypotheses.org/4938
1•gantagonist•50m ago•0 comments

Adult sports leagues took over your city

https://thehustle.co/originals/how-adult-sports-leagues-took-over-your-city
2•paulpauper•51m ago•0 comments