frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

MS-DOS game copy protection and cracks

https://www.dosdays.co.uk/topics/game_cracks.php
1•TheCraiggers•49s ago•0 comments

Updates on GNU/Hurd progress [video]

https://fosdem.org/2026/schedule/event/7FZXHF-updates_on_gnuhurd_progress_rump_drivers_64bit_smp_...
1•birdculture•1m ago•0 comments

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

https://xcancel.com/search?f=tweets&q=davenewworld_2%2Fstatus%2F2020128223850316274
1•doener•1m ago•0 comments

MyFlames: Visualize MySQL query execution plans as interactive FlameGraphs

https://github.com/vgrippa/myflames
1•tanelpoder•3m ago•0 comments

Show HN: LLM of Babel

https://clairefro.github.io/llm-of-babel/
1•marjipan200•3m ago•0 comments

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

https://github.com/lance0/xfr
1•tanelpoder•4m ago•0 comments

Famfamfam Silk icons – also with CSS spritesheet

https://github.com/legacy-icons/famfamfam-silk
1•thunderbong•5m ago•0 comments

Apple is the only Big Tech company whose capex declined last quarter

https://sherwood.news/tech/apple-is-the-only-big-tech-company-whose-capex-declined-last-quarter/
1•elsewhen•8m ago•0 comments

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
2•todsacerdoti•9m ago•0 comments

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

https://github.com/yupme-bot/kernel-ndjson-proofs
1•Slaine•13m ago•0 comments

The Greater Copenhagen Region could be your friend's next career move

https://www.greatercphregion.com/friend-recruiter-program
1•mooreds•13m ago•0 comments

Do Not Confirm – Fiction by OpenClaw

https://thedailymolt.substack.com/p/do-not-confirm
1•jamesjyu•14m ago•0 comments

The Analytical Profile of Peas

https://www.fossanalytics.com/en/news-articles/more-industries/the-analytical-profile-of-peas
1•mooreds•14m ago•0 comments

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

https://jobswithgpt.com/blog/llm-eval-hallucinations-t20-cricket/
1•sp1982•14m ago•0 comments

What AI is good for, according to developers

https://github.blog/ai-and-ml/generative-ai/what-ai-is-actually-good-for-according-to-developers/
1•mooreds•14m ago•0 comments

OpenAI might pivot to the "most addictive digital friend" or face extinction

https://twitter.com/lebed2045/status/2020184853271167186
1•lebed2045•15m ago•2 comments

Show HN: Know how your SaaS is doing in 30 seconds

https://anypanel.io
1•dasfelix•16m ago•0 comments

ClawdBot Ordered Me Lunch

https://nickalexander.org/drafts/auto-sandwich.html
3•nick007•17m ago•0 comments

What the News media thinks about your Indian stock investments

https://stocktrends.numerical.works/
1•mindaslab•18m ago•0 comments

Running Lua on a tiny console from 2001

https://ivie.codes/page/pokemon-mini-lua
1•Charmunk•18m ago•0 comments

Google and Microsoft Paying Creators $500K+ to Promote AI Tools

https://www.cnbc.com/2026/02/06/google-microsoft-pay-creators-500000-and-more-to-promote-ai.html
2•belter•21m ago•0 comments

New filtration technology could be game-changer in removal of PFAS

https://www.theguardian.com/environment/2026/jan/23/pfas-forever-chemicals-filtration
1•PaulHoule•22m ago•0 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
2•momciloo•22m ago•0 comments

Kinda Surprised by Seadance2's Moderation

https://seedanceai.me/
1•ri-vai•22m ago•2 comments

I Write Games in C (yes, C)

https://jonathanwhiting.com/writing/blog/games_in_c/
2•valyala•22m ago•0 comments

Django scales. Stop blaming the framework (part 1 of 3)

https://medium.com/@tk512/django-scales-stop-blaming-the-framework-part-1-of-3-a2b5b0ff811f
1•sgt•23m ago•0 comments

Malwarebytes Is Now in ChatGPT

https://www.malwarebytes.com/blog/product/2026/02/scam-checking-just-got-easier-malwarebytes-is-n...
1•m-hodges•23m ago•0 comments

Thoughts on the job market in the age of LLMs

https://www.interconnects.ai/p/thoughts-on-the-hiring-market-in
1•gmays•23m ago•0 comments

Show HN: Stacky – certain block game clone

https://www.susmel.com/stacky/
3•Keyframe•26m ago•0 comments

AIII: A public benchmark for AI narrative and political independence

https://github.com/GRMPZQUIDOS/AIII
1•GRMPZ23•26m ago•0 comments
Open in hackernews

Beating the L1 cache with value speculation (2021)

https://mazzo.li/posts/value-speculation.html
47•shoo•3mo ago

Comments

signa11•3mo ago
https://news.ycombinator.com/item?id=45499965
hshdhdhehd•3mo ago
I am new to this low level, but am I right in understanding this works because he uses a linked list but often it is contiguous in memory so you guess the next element is contiguous and if it is the branch predictor predicts you are right and saves going to cache and breaking the pipeline.

However I imagine you'd also get the same great performance using an array?

vlovich123•3mo ago
Yes, as he noted the trick is of limited value in practice
kazinator•3mo ago
Consecutive linked list nodes can occur when you bump allocate them, and in particularly if you have a copying garbage collector which ensures that bump allocation takes place from blank slate heap areas with no gaps.

Idea: what if we implement something that resembles CDR coding, but doesn't compact the cells together (not a space-saving device). The idea as is that when we have two cells A and B such that A->cdr == B, and such that A + 1 == B, then we replace A->cdr with some special constant which says the same thing; indicates that A->cdr is equivalent to A + 1.

Then, I think, we could have a very simple, stable and portable form of the trick in the article:

  while (node) {
    value += node->value;
    if (node->next == NEXT_IS_CONSECUTIVE)
      next = node + 1;
    else
      next = node->next;
    node = next;
  }
The branch predictor can predict that the branch is taken (our bump allocator ensures that is frequently the case), and go straight to next = node + 1. When in the speculatively executed alternative path, the load of node->next completes and is not equal to the magic value, then the predicted path is canceled and we gret node->next.

This doesn't look like something that can be optimized away, because we are not comparing node->next to node + 1; there is no tautology there.

bjornsing•3mo ago
Yes. But I don’t think the OP is suggesting this as an alternative to using an array. As I read / skimmed it the linked list is just a simplified example. You can use this trick in more complex situations too, eg if you’re searching a tree structure and you know that some paths through the tree are much more common than others.
stinkbeetle•3mo ago
Data speculation is a CPU technique too, which Apple CPUs are known to implement. Apparently they can do stride detection when predicting address values.

Someone with a M >= 2 might try the code and find no speedup with the "improved" version, and that it's already iterating faster than L1 load-to-use latency.

bjornsing•3mo ago
But that works on a different level, right? At least as I understand it data speculation is about prefetching from memory into cache. This trick is about using the branch predictor as an ultra-fast ”L0” cache you could say. At least that’s how I understand it.
stinkbeetle•3mo ago
This is doing value speculation in software, using the branch predictor. The hardware of course does do that and instead uses different tables for deriving a predicted value, and misprediction will be detected and flushed in a slightly different way.

But the effect on the main sequence of instructions in the backend will be quite similar. In neither case is it a "prefetch" as such, it is actually executing the load with the predicted value and the result will be consumed by other instructions, decoupling address generation from dependency on previous load result.

bjornsing•3mo ago
Yeah that’s sort of how I understand the OP too: The CPU will execute speculatively on the assumption that the next element in the linked list is consecutive in memory, so it doesn’t have to wait for L1 cache. It needs to check the real value in L1 of course, but not synchronously.
adrian_b•3mo ago
Value prediction and address prediction are very different things.

Address prediction for loads and stores, by detecting various kinds of strides and access patterns, is done by most CPUs designed during the last 25 years and it is used for prefetching the corresponding data. This is as important for loads and stores as branch prediction for branches.

On the other hand, value prediction for loads is done by very few CPUs and for very restricted use cases, because in general it is too costly in comparison with meager benefits. Unlike for branch direction prediction and branch target prediction, where the set from which the predicted value must be chosen is small, the set from which to choose the value that will be returned by a load is huge, except for very specific applications, e.g. which repeatedly load values from a small table.

The application from the parent article is such a very special case, because the value returned by the load can be computed without loading it, except for exceptional cases, which are detected when the loaded value is different from the pre-computed value.

stinkbeetle•3mo ago
> Value prediction and address prediction are very different things.

Both are classes of data prediction, and Apple CPUs do both.

> Address prediction for loads and stores, by detecting various kinds of strides and access patterns, is done by most CPUs designed during the last 25 years and it is used for prefetching the corresponding data. This is as important for loads and stores as branch prediction for branches.

That is not what is known as load/store address prediction. That is cache prefetching, which of course has to predict addresses in some manner too.

> On the other hand, value prediction for loads is done by very few CPUs and for very restricted use cases, because in general it is too costly in comparison with meager benefits. Unlike for branch direction prediction and branch target prediction, where the set from which the predicted value must be chosen is small, the set from which to choose the value that will be returned by a load is huge, except for very specific applications, e.g. which repeatedly load values from a small table.

I'm talking about load address prediction specifically. Apple has both, but load value prediction would not trigger here because I don't think it does pattern/stride detection like load address, but rather is value based and you'd have to see the same values coming from the load. Their load address predictor does do strides though.

I don't know if it needs cache misses or other long latency sources to kick in and start training or not, so I'm not entirely sure if it would capture this pattern. But it can capture similar for sure. I have an M4 somewhere, I should dig it out and try.

> The application from the parent article is such a very special case, because the value returned by the load can be computed without loading it, except for exceptional cases, which are detected when the loaded value is different from the pre-computed value.

rini17•3mo ago
Won't it introduce risk of invalid memory access when the list isn't contiguous? And if it always is contiguous then why not use array instead. Smells like contrived example.
imtringued•3mo ago
Yeah this use case is pretty contrived. Even a simple unrolled linked list would beat his implementation.
pkhuong•3mo ago
> Won't it introduce risk of invalid memory access

no.