frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•1m ago•0 comments

Sid Meier's System for Real-Time Music Composition and Synthesis

https://patents.google.com/patent/US5496962A/en
1•GaryBluto•9m ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
3•keepamovin•10m ago•1 comments

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

https://github.com/justin55afdfdsf5ds45f4ds5f45ds4/EmpusaAI
1•justinlord•12m ago•0 comments

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

https://github.com/0xdeadbeefnetwork/sigil-web
2•sickthecat•14m ago•1 comments

White House Explores Opening Antitrust Probe on Homebuilders

https://www.bloomberg.com/news/articles/2026-02-06/white-house-explores-opening-antitrust-probe-i...
1•petethomas•15m ago•0 comments

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

https://minddraft.ai
2•imthepk•20m ago•0 comments

How do you estimate AI app development costs accurately?

1•insights123•21m ago•0 comments

Going Through Snowden Documents, Part 5

https://libroot.org/posts/going-through-snowden-documents-part-5/
1•goto1•21m ago•0 comments

Show HN: MCP Server for TradeStation

https://github.com/theelderwand/tradestation-mcp
1•theelderwand•24m ago•0 comments

Canada unveils auto industry plan in latest pivot away from US

https://www.bbc.com/news/articles/cvgd2j80klmo
2•breve•25m ago•1 comments

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•28m ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•29m ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•32m ago•1 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•33m ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
6•tempodox•34m ago•2 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•38m ago•0 comments

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

https://www.youtube.com/watch?v=cmMQbsOTX-o
1•adityaathalye•41m ago•0 comments

US moves to deport 5-year-old detained in Minnesota

https://www.reuters.com/legal/government/us-moves-deport-5-year-old-detained-minnesota-2026-02-06/
7•petethomas•44m ago•2 comments

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•49m ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•1h ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
3•init0•1h ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•1h ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
2•fkdk•1h ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
2•ukuina•1h ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•1h ago•1 comments

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•1h ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
3•endorphine•1h ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•1h ago•0 comments

Advanced Inertial Reference Sphere

https://en.wikipedia.org/wiki/Advanced_Inertial_Reference_Sphere
1•cyanf•1h ago•0 comments
Open in hackernews

Branch prediction: Why CPUs can't wait?

https://namvdo.ai/cpu-branch-prediction/
50•signa11•5mo ago

Comments

whitten•5mo ago
I know branch prediction is essential if you have instruction pipelining in actual CPU hardware.

It is an interesting thought experiment re instruction pipelining in a virtual machine or interpreter design. What would you change in a design to allow it ? Would an asynchronous architecture be necessary ? How would you merge control flow together efficiently to take advantage of it ?

addaon•5mo ago
> I know branch prediction is essential if you have instruction pipelining in actual CPU hardware.

With sufficiently slow memory, relative to the pipeline speed. A microcontroller executing out of TCM doesn’t gain anything from prediction, since instruction fetches can keep up with the pipeline.

immibis•5mo ago
The head of the pipeline is at least several clock cycles ahead of the tail, by definition. At the time the branch instruction reaches the part of the CPU where it decides whether to branch or not, the next several instructions have already been fetched, decoded and partially executed, and that's thrown away on a mispredicted branch.

There may not be a large delay when executing from TCM with a short pipeline, but it's still there. It can be so small that it doesn't justify the expense of a branch predictor. Many microcontrollers are optimized for power consumption, which means simplicity. I expect microcontroller-class chips to largely run in-order with short pipelines and low-ish clock speeds, although there are exceptions. Older generations of microcontrollers (PIC/AVR) weren't even pipelined at all.

addaon•5mo ago
> but it's still there

Unless you evaluate branches in the second stage of the pipeline and forward them. Or add a delay slot and forward them from the third stage. In the typical case you’re of course correct, but there are many approaches out there.

cogman10•5mo ago
With the way architectures have gone, I think you'd end up recreating VLIW. The thing holding back VLIW was compilers were too dumb and computers too slow to really take advantage of it. You ended up with a lot of "NOP"s as a result in the output. VLIW is essentially how modern GPUs operate.

The main benefit of VLIW is that it simplifies the processor design by moving the complicated tasks/circuitry into the compiler. Theoretically, the compiler has more information about the intent of the program which allows it to better optimize things.

It would also be somewhat of a security boon. VLIW moves the branch prediction (and rewinding) into the processor. With exploits like spectre, pulling that out would make it easier to integrate compiler hints on security sensitive code "hey, don't spec ex here".

_chris_•5mo ago
> The thing holding back VLIW was compilers were too dumb

That’s not really the problem.

The real issue is that VLIW requires branches to be strongly biased, statically, so a compiler can exploit them.

But in fact branches are very dynamic but trivially predicted by branch predictors, so branch predictors win.

Not to mention that even vliw cores use branch predictors, because the branch resolution latency is too long to wait for the branch outcome to be known.

o11c•5mo ago
You have to ensure that virtual instructions map to distinct hardware instructions.

Computed-goto-after-each-instruction is well known, and copying fragments of machine code is obvious.

Less known is "make an entire copy of your interpreter for each state" - though I'm only aware of this as a depessimization for stack machines.

https://dl.acm.org/doi/pdf/10.1145/223428.207165

But the main problem, which none of these solve, is that most VM languages are designed to be impossible (or very difficult) to optimize, due to aggressive use of dynamic typing. Nothing will save you from dynamic types.

zenolijo•5mo ago
I do wonder how branch prediction actually works in the CPU, predicting which branch to take also seems like it should be expensive, but I guess something clever is going on.

I've also found G_LIKELY and G_UNLIKELY in glib to be useful when writing some types of performance-critical code. Would be a fun experiment to compare the assembly when using it and not using it.

hansvm•5mo ago
Semantically it's just a table from instruction location to branch probability. Some nuances exist in:

- Table overflow mitigation: multi-leveled tables, not wasting space on 100% predicted branches, etc

- Table eviction: Rolling counts are actually impossible without space consumption; do you have space wasted, periodic flushing, exponential moving averages, etc

- Table initialization: When do you start caring about a branch (and wasting table space), how conservative are the initial parameters, etc

- Table overflow: What do you do when a branch doesn't fit in the table but should

As a rule of thumb, no extra information/context is used for branch prediction. If a program over the course of a few thousand instructions hits a branch X% of the time, then X will be the branch prediction. If you have context you want to use to influence the prediction, you need to manifest that context as additional lines of assembly the predictor can use in its lookup table.

As another rule of thumb, if the hot path has more than a few thousand branches (on modern architectures, often just a few thousand <100% branches (you want the assembly to generate the jump-if-not-equal in the right direction for that architecture though, else you'll get a 100% misprediction rate instead)) then you'll hit slow paths -- multi-leveled search, mispredicted branches, etc.

It's reasonably interesting, and given that it's hardware it's definitely clever, but it's not _that_ clever from a software perspective. Is there anything in particular you're curious about?

NobodyNada•5mo ago
> If a program over the course of a few thousand instructions hits a branch X% of the time, then X will be the branch prediction.

This is not completely true - modern branch predictors can recognize patterns such as "this branch is taken every other time", or "every 5th time", etc. They also can, in some cases, recognize correlations between nearby branches.

However, they won't use factors like register or memory contents to predict branches, because that would require waiting for that data to be available to make the prediction -- which of course defeats the point of branch prediction.

aclindsa•5mo ago
> As a rule of thumb, no extra information/context is used for branch prediction.

As you'll find in the results of the recent branch prediction competition, global branch history (a record of which branches have recently been taken) is a piece of context critical to achieving the high accuracy of modern branch predictors: https://ericrotenberg.wordpress.ncsu.edu/cbp2025-workshop-pr...

dmoy•5mo ago
There's a bunch of ways it works. There's a tradeoff between hardware cost and accuracy. Sometimes it's static, sometimes there's a counter of varying size (1 bit, 2 bit, etc). It can get a lot more complicated.

The basic branch predictors are very cheap, and often good enough (90%+ accuracy).

Patterson & Hennessy goes into a bunch of detail.

rayiner•5mo ago
It’s fairly expensive but well suited to pipelined implementations in hardware circuits: https://medium.com/@himanshu0525125/global-history-branch-pr.... Modern CPU branch predictors can deliver multiple predictions per clock cycle.
delta_p_delta_x•5mo ago
> I do wonder how branch prediction actually works in the CPU, predicting which branch to take also seems like it should be expensive

There are a few hardware algorithms that are vendor-dependent. The earliest branch predictors were two-bit saturating counters that moved between four states of 'strongly taken', 'weakly taken', 'weakly not taken', 'strongly not taken', and the state change depended on the eventual computed result of the branch.

Newer branch predictors are stuff like two-level adaptive branch predictors that are a hardware `std::unordered_map` of branch instruction addresses to the above-mentioned saturating counters; this remembers the result of the last n (where n is the size of the map) branch instructions.

Ryzen CPUs contain perceptron branch predictors that are basically hardware neural networks—not far from LLMs.

bee_rider•5mo ago
Modern branch predictors are pretty sophisticated. But, it is also worth keeping in mind that you can do pretty good, for a lot of codes, by predicting simple things like “backwards jumps will probably be followed.” Because a backwards jump is probably a loop, and so jumping backwards is by far the most likely thing to do (because most loops go through more than one iteration).

And a lot of programmers are willing to conspire with the hardware folks, to make sure their heuristics work out. Poor branches, never had any chances.

moregrist•5mo ago
There’s ample information out there. There are quite a few text books, blogs, and YouTube videos covering computer architecture, including branch prediction.

For example: - Dan Luu has a nice write-up: https://danluu.com/branch-prediction/ - Wikipedia’s page is decent: https://en.m.wikipedia.org/wiki/Branch_predictor

> I've also found G_LIKELY and G_UNLIKELY in glib to be useful when writing some types of performance-critical code.

A lot of the time this is a hint to the compiler on what the expected paths are so it can keep those paths linear. IIRC, this mainly helps instruction cache locality.

_chris_•5mo ago
> A lot of the time this is a hint to the compiler on what the expected paths are so it can keep those paths linear. IIRC, this mainly helps instruction cache locality.

The real value is that the easiest branch to predict is a never-taken branch. So if the compiler can turn a branch into a never-taken branch with the common path being straight line code, then you win big.

And it takes no space or effort to predict never taken branches.

o11c•5mo ago
> And it takes no space or effort to predict never taken branches.

Is that actually true, given that branch history is stored lossily? What if other branches that have the same hash are all always taken?

_chris_•5mo ago
A BPU needs to predict 3 things:

  - 1) Is there a branch here?
  - 2) If so, is it taken?
  - 3) If so, where to?
If a conditional branch is never taken, then it's effectively a NOP, and you never store it anywhere, so you treat (1) as "no there isn't a branch here." Doesn't get cheaper than that.

Of course, (1) and (3) are very important, so you pick your hashes to reduce aliasing to some low, but acceptable level. Otherwise you just have to eat mispredicts if you alias too much.

Note: (1) and (3) aren't really functions of history, they're functions of their static location in the binary (I'm simplifying a tad but whatever). You can more freely alias on (2), which is very history-dependent, because (1) will guard it.

pkaye•5mo ago
Here is some examples of the different branch prediction algorithms.

https://enesharman.medium.com/branch-prediction-algorithms-a...

atq2119•5mo ago
By now I'd assume that all modern high performance CPUs use some form of TAGE (tagged geometric history) branch prediction, so that's a good keyword to search for if you really want to get into it.
ip26•5mo ago
It is expensive, but it’s even more expensive not to.
ActorNightly•5mo ago
Branch prediction is probably the main reason CPUs got fast in the past 2 decades. As Jim Keller descrbied, modern BPs look very much like neural networks.
checker659•5mo ago
There are two things to predict: whether there will be a branch, and if so, to where.
remexre•5mo ago
https://danluu.com/branch-prediction/ is a good illustrated overview of a few algorithms.
Izmaki•5mo ago
My favourite explanation of how Branch Prediction works: https://stackoverflow.com/a/11227902/1150676
zzo38computer•5mo ago
MMIX instruction set specifies the branch prediction explicitly.

If you also have a "branch always" and "branch never" and the compiler can generate a code to modify that instruction during the initialization of the program, then for some programs where some of the branches are known during initialization, it might modify the code when it is initialized before it is executed.

immibis•5mo ago
Pretty much every CPU has a "branch always" (it's called "branch" or "jump") and a "branch never" (it's called "nop"). The language support for this is the tricky part.
zzo38computer•5mo ago
For "branch always", yes, but for "branch never", it is not necessarily the same as the "nop" in many instruction sets, because it would still have an operand of the same size and format of a branch instruction, although the operand is ignored.

(For instruction sets with fixed size instructions, the "nop" will potentially work if it has an operand which is not used for any other purpose; the SWYM instruction on MMIX is also no operation but the operand may be used for communication with debuggers or for other stuff.)

dan_hawkins•5mo ago
You can just use "branch always" but to the next instruction (:
eigenform•5mo ago
Think you're referring to the idea that "my compiler can know that some branch is always/never taken" and turn it into an unconditional control-flow instruction (either "always jump here", or "always continue sequentially" and don't emit anything!).

But the parent comment is talking about "hinting" for branches where the compiler cannot compute this ahead of time, and the CPU is responsible for resolving it during runtime. This is usually exposed by the ISA, ie. a bit in the branch instruction encoding that tells the machine "when you encounter this instruction for the first time, the default prediction should be 'taken'."

In practice, branches are usually predicted "not-taken" by default:

- It's advantageous to assume that control-flow is sequential because [pre-]fetching sequentially is the easy case

- It's wasteful to track the target addresses of branches in your predictor if they aren't taken at least once!

gpderetta•5mo ago
A few ISAs have used branch prediction hints but they have gone out of fashion (those on x86 are now ignored for example).
imtringued•5mo ago
They are useless because the compiler can simply lay out the code so that the most likely path doesn't trigger the branch and executes the code sequentially.

The pipeline has already loaded these instructions anyway, so continuing execution costs nothing. You then only need to predict the exceptional cases that do trigger the branch.

This means the only use case for branch prediction hints is for use cases where the most likely branch is changing at runtime. That is such a niche use case that it is never worth it. If you do need this, then you are better off investing into JIT instead of changing the ISA.

eigenform•5mo ago
Yep, you typically don't see it because we learned that it's easier to just assume that the default prediction is "not-taken."

AFAICT if you're hinting that a branch is biased-taken, really the only thing you might be doing is avoiding potential latency/incorrect speculation that comes with an initial misprediction (when you discover a branch that isn't currently being tracked). I think it's basically just an injunction to "don't wait for a misprediction, just install this branch in your BTB immediately" (or something along those lines).

zzo38computer•5mo ago
One of the situations I was describing is if there is a branch which, whether or not it is taken is the same each time it is reached during one execution of the program, but whether or not it will be taken will be decided during the initialization of the program, before the first time that branch is reached.
bob1029•5mo ago
Branch prediction is getting pretty crazy with the latest hardware generation.

https://hackaday.com/2024/07/28/amd-returns-to-1996-with-zen...