Branch prediction: Why CPUs can't wait?

https://namvdo.ai/cpu-branch-prediction/

13•signa11•3d ago

Comments

whitten•46m ago

I know branch prediction is essential if you have instruction pipelining in actual CPU hardware.

It is an interesting thought experiment re instruction pipelining in a virtual machine or interpreter design. What would you change in a design to allow it ? Would an asynchronous architecture be necessary ? How would you merge control flow together efficiently to take advantage of it ?

addaon•2m ago

> I know branch prediction is essential if you have instruction pipelining in actual CPU hardware.

With sufficiently slow memory, relative to the pipeline speed. A microcontroller executing out of TCM doesn’t gain anything from prediction, since instruction fetches can keep up with the pipeline.

zenolijo•39m ago

I do wonder how branch prediction actually works in the CPU, predicting which branch to take also seems like it should be expensive, but I guess something clever is going on.

I've also found G_LIKELY and G_UNLIKELY in glib to be useful when writing some types of performance-critical code. Would be a fun experiment to compare the assembly when using it and not using it.

hansvm•16m ago

Semantically it's just a table from instruction location to branch probability. Some nuances exist in:

- Table overflow mitigation: multi-leveled tables, not wasting space on 100% predicted branches, etc

- Table eviction: Rolling counts are actually impossible without space consumption; do you have space wasted, periodic flushing, exponential moving averages, etc

- Table initialization: When do you start caring about a branch (and wasting table space), how conservative are the initial parameters, etc

- Table overflow: What do you do when a branch doesn't fit in the table but should

As a rule of thumb, no extra information/context is used for branch prediction. If a program over the course of a few thousand instructions hits a branch X% of the time, then X will be the branch prediction. If you have context you want to use to influence the prediction, you need to manifest that context as additional lines of assembly the predictor can use in its lookup table.

As another rule of thumb, if the hot path has more than a few thousand branches (on modern architectures, often just a few thousand <100% branches (you want the assembly to generate the jump-if-not-equal in the right direction for that architecture though, else you'll get a 100% misprediction rate instead)) then you'll hit slow paths -- multi-leveled search, mispredicted branches, etc.

It's reasonably interesting, and given that it's hardware it's definitely clever, but it's not _that_ clever from a software perspective. Is there anything in particular you're curious about?

NobodyNada•3m ago

> If a program over the course of a few thousand instructions hits a branch X% of the time, then X will be the branch prediction.

This is not completely true - modern branch predictors can recognize patterns such as "this branch is taken every other time", or "every 5th time", etc. They also can, in some cases, recognize correlations between nearby branches.

However, they won't use factors like register or memory contents to predict branches, because that would require waiting for that data to be available to make the prediction -- which of course defeats the point of branch prediction.

dmoy•16m ago

There's a bunch of ways it works. There's a tradeoff between hardware cost and accuracy. Sometimes it's static, sometimes there's a counter of varying size (1 bit, 2 bit, etc). It can get a lot more complicated.

The basic branch predictors are very cheap, and often good enough (90%+ accuracy).

Patterson & Hennessy goes into a bunch of detail.

rayiner•13m ago

It’s fairly expensive but well suited to pipelined implementations in hardware circuits: https://medium.com/@himanshu0525125/global-history-branch-pr.... Modern CPU branch predictors can deliver multiple predictions per clock cycle.

delta_p_delta_x•11m ago

> I do wonder how branch prediction actually works in the CPU, predicting which branch to take also seems like it should be expensive

There are a few hardware algorithms that are vendor-dependent. The earliest branch predictors were two-bit saturating counters that moved between four states of 'strongly taken', 'weakly taken', 'weakly not taken', 'strongly not taken', and the state change depended on the eventual computed result of the branch.

Newer branch predictors are stuff like two-level adaptive branch predictors that are a hardware `std::unordered_map` of branch instruction addresses to the above-mentioned saturating counters; this remembers the result of the last n (where n is the size of the map) branch instructions.

Ryzen CPUs contain perceptron branch predictors that are basically hardware neural networks—not far from LLMs.

bee_rider•8m ago

Modern branch predictors are pretty sophisticated. But, it is also worth keeping in mind that you can do pretty good, for a lot of codes, by predicting simple things like “backwards jumps will probably be followed.” Because a backwards jump is probably a loop, and so jumping backwards is by far the most likely thing to do (because most loops go through more than one iteration).

And a lot of programmers are willing to conspire with the hardware folks, to make sure their heuristics work out. Poor branches, never had any chances.

moregrist•7m ago

There’s ample information out there. There are quite a few text books, blogs, and YouTube videos covering computer architecture, including branch prediction.

For example: - Dan Luu has a nice write-up: https://danluu.com/branch-prediction/ - Wikipedia’s page is decent: https://en.m.wikipedia.org/wiki/Branch_predictor

> I've also found G_LIKELY and G_UNLIKELY in glib to be useful when writing some types of performance-critical code.

A lot of the time this is a hint to the compiler on what the expected paths are so it can keep those paths linear. IIRC, this mainly helps instruction cache locality.

pkaye•6m ago

Here is some examples of the different branch prediction algorithms.

https://enesharman.medium.com/branch-prediction-algorithms-a...

atq2119•2m ago

[delayed]

ip26•1m ago

It is expensive, but it’s even more expensive not to.

Izmaki•5m ago

My favourite explanation of how Branch Prediction works: https://stackoverflow.com/a/11227902/1150676

How we exploited CodeRabbit: From simple PR to RCE and write access on 1M repos

D2 (text to diagram tool) now supports ASCII renders

Emacs as your video-trimming tool

Perfect Freehand – Draw perfect pressure-sensitive freehand lines

Without the futex, it's futile

Show HN: OpenAI/reflect – Physical AI Assistant that illuminates your life

How Figma’s multiplayer technology works (2019)

The new geography of stolen goods

Candle Flame Oscillations as a Clock

Vendors that treat single sign-on as a luxury feature

Notion releases offline mode

AnduinOS

Why Semantic Layers Matter (and how to build one with DuckDB)

Custom telescope mount using harmonic drives and ESP32

Lazy-brush – smooth drawing with mouse or finger

A renovation project in Turkey led to the discovery of a lost city (2023)

Branch prediction: Why CPUs can't wait?

The joy of recursion, immutable data, & pure functions: Making mazes with JS

CRDT: Text Buffer

Launch HN: Uplift (YC S25) – Voice models for under-served languages

How to Build a Medieval Castle

Show HN: Chroma Cloud – serverless search database for AI

Geotoy – Shadertoy for 3D Geometry

CRLite: Certificate Revocation Checking in Firefox

Launch HN: Parachute (YC S25) – Guardrails for Clinical AI

Critical Cache Poisoning Vulnerability in Dnsmasq

Positron, a New Data Science IDE

Prime Number Grid

Medical cannabis patient data exposed by unsecured database

"Remove mentions of XSLT from the html spec"

How we exploited CodeRabbit: From simple PR to RCE and write access on 1M repos

D2 (text to diagram tool) now supports ASCII renders

Emacs as your video-trimming tool

Perfect Freehand – Draw perfect pressure-sensitive freehand lines

Without the futex, it's futile

Show HN: OpenAI/reflect – Physical AI Assistant that illuminates your life

How Figma’s multiplayer technology works (2019)

The new geography of stolen goods

Candle Flame Oscillations as a Clock

Vendors that treat single sign-on as a luxury feature

Notion releases offline mode

AnduinOS

Why Semantic Layers Matter (and how to build one with DuckDB)

Custom telescope mount using harmonic drives and ESP32

Lazy-brush – smooth drawing with mouse or finger

A renovation project in Turkey led to the discovery of a lost city (2023)

Branch prediction: Why CPUs can't wait?

The joy of recursion, immutable data, & pure functions: Making mazes with JS

CRDT: Text Buffer

Launch HN: Uplift (YC S25) – Voice models for under-served languages

How to Build a Medieval Castle

Show HN: Chroma Cloud – serverless search database for AI

Geotoy – Shadertoy for 3D Geometry

CRLite: Certificate Revocation Checking in Firefox

Launch HN: Parachute (YC S25) – Guardrails for Clinical AI

Critical Cache Poisoning Vulnerability in Dnsmasq

Positron, a New Data Science IDE

Prime Number Grid

Medical cannabis patient data exposed by unsecured database

"Remove mentions of XSLT from the html spec"

Branch prediction: Why CPUs can't wait?

Comments