Operation Costs in CPU Clock Cycles (2016)

http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/

26•limoce•5h ago

Comments

net01•5h ago

you can find a better table here with most operations and time:

supports most modern and old architectures

bee_rider•5h ago

I think it is a totally different type of table. Yours is real data. Theirs is more like a ballpark. Maybe there could be some use for the latter? Just to help folks reason about performance.

Although, reasoning about performance can be hard anyway.

Liftyee•4h ago

I agree with this. As someone who's not an expert in assembly and CPU architecture the "simplified" estimates in a condensed log-chart format was much more insightful. The exact data for specific architectures would be useful for more advanced users than me, but it doesn't offer the same quick "big picture" overview.

bee_rider•3h ago

Did you get a chance to use it? I’ve only just come across this table now, so I haven’t bad a chance to actually try and use it for anything, so I wouldn’t be able to evaluate the usefulness.

I have a sneaking suspicion that this table is satisfying for our brains as a vaguely technical and interesting thing, but I’m not sure how useful it really is. In general the compiler will be really creative in reordering instructions, and the CPU will also be creative about which ones it runs parallel (since it is good at discovering instruction level parallelism). So, I wonder if the level of study necessary to use this information also requires the level of data that is available in the detailed table.

I have not done much caring about instructions, it seems very hard. FWIW I have had some success caring about reducing the number of trips to memory and making sure the dependencies are obvious to the computer, so I’m not totally naive… but I think that caring about instruction timing is mostly for the real hardcore optimization badasses.

owlbite•2h ago

Trying to reduce high end processor performance to "operation X takes Y cycles" likely confuses the uninitiated more than it helps once you get beyond "cache miss bad".

For the uninitiated, most high-performance CPUs of recent years:

- Are massively out-of-order. It will run any operation that has all inputs satisfied in the next slot of the right type available.

- Have multiple functional units. A recent apple CPU can and will run 5+ different integer ops, 3+ load/stores and 3+ floating point ops per cycle if it can feed them all. And it may well do zero-cost register renames on the fly for "free".

- Functional units are pipelined, you can throw 1 op in the front end of the pipe each cycle, but the result sometimes isn't available for consumption until maybe 3-20 cycles later (latency depends on the type of the op and if it can bypass into the next op executed).

- They will speculate on branch results and if they get them wrong it needs to flush the pipeline and do the right thing.

- Assorted hazards may give +/- on the timing you might get in a different situation.

torium•2h ago

The x-axis is in CPU cycles (10^0 means 1 cycle).

If you CPU runs on 1000MHz that's 10^9 cycles per second. On that CPU the right hand side of the picture corresponds to 1ms. You can do 1 million register-register operations in 1ms, or 1 billion in 1sec.

Computers are fast.

stephencanon•1h ago

Worth noting that division (integer, fp, and simd) has gotten much cheaper in the last decade. Division is partially pipelined on common microarchitectures now (capable of delivering a result every 2-4 cycles) and have greatly reduced latency from ~30-80 cycles down to ~10-20 cycles.

This improvement is sufficient to tip the balance toward favoring division in some algorithms where historically programmers went out of their way to avoid it.

Future AI bills of $100k/yr per dev

Wikimedia Foundation Challenges UK Online Safety Act Regulations

I tried every todo app and ended up with a .txt file

GitHub is no longer independent at Microsoft after CEO resignation

The Associated Press tells its book critics that it's ending weekly reviews

OpenSSH Post-Quantum Cryptography

The Demographic Future of Humanity: Facts and Consequences [pdf]

Claude Is the Drug, Cursor Is the Dealer

The Value of Institutional Memory

Trellis (YC W24) Is Hiring: Automate Prior Auth in Healthcare

The Joy of Mixing Custom Elements, Web Components, and Markdown

Neki – sharded Postgres by the team behind Vitess

UI vs. API. vs. UAI

Byte Buddy is a code generation and manipulation library for Java

Claude Code is all you need

Pricing Pages – A Curated Gallery of Pricing Page Designs

The Chrome VRP Panel has decided to award $250k for this report

Launch HN: Halluminate (YC S25) – Simulating the internet to train computer use

36B solar mass black hole at centre of the Cosmic Horseshoe gravitational lens

Learn, Reflect, Apply, Prepare: The Four Daily Practices That Changed How I Live

Porting to OS/2 – GitPius

Designing Software in the Large

Wikipedia loses challenge against Online Safety Act verification rules

Washington, DC police put under federal control, National Guard deployed

How Boom uses software to accelerate hardware development

Faster substring search with SIMD in Zig

Mistral Integration Improved in Llama.cpp

Apache Iceberg V3 Spec new features for more efficient and flexible data lakes

A Global Look at Teletext

A simple pixel physics simulator in Rust using Macroquad