A CPU that runs entirely on GPU

https://github.com/robertcprice/nCPU

49•cypres•3h ago

Comments

RagnarD•1h ago

Being able to perform precise math in an LLM is important, glad to see this.

jdjdndnzn•1h ago

Just want to point out this comment is highly ironic.

This is all a computer does :P

We need llms to be able to tap that not add the same functionality a layer above and MUCH less efficiently.

Nuzzerino•1h ago

> We need llms to be able to tap that not add the same functionality a layer above and MUCH less efficiently.

Agents, tool-integrated reasoning, even chain of thought (limited, for some math) can address this.

RagnarD•6m ago

You're both completely missing the point. It's important that an LLM be able to perform exact arithmetic reliably without a tool call. Of course the underlying hardware does so extremely rapidly, that's not the point.

lorenzohess•1h ago

Out of curiosity, how much slower is this than an actual CPU?

bastawhiz•1h ago

Based on addition and subtraction, 625000x slower or so than a 2.5ghz cpu

sudo_cowsay•1h ago

"Multiplication is 12x faster than addition..."

Wow. That's cool but what happens to the regular CPU?

adrian_b•41m ago

This CPU simulator does not attempt to achieve the maximum speed that could be obtained when simulating a CPU on a GPU.

For that a completely different approach would be needed, e.g. by implementing something akin to qemu, where each CPU instruction would be translated into a graphic shader program. On many older GPUs, it is impossible or difficult to launch a graphic program from inside a graphic program (instead of from the CPU), but where this is possible one could obtain a CPU emulation that would be many orders of magnitude faster than what is demonstrated here.

Instead of going for speed, the project demonstrates a simpler self-contained implementation based on the same kind of neural networks used for ML/AI, which might work even on an NPU, not only on a GPU.

Because it uses inappropriate hardware execution units, the speed is modest and the speed ratios between different kinds of instructions are weird, but nonetheless this is an impressive achievement, i.e. simulating the complete Aarch64 ISA with such means.

Surac•1h ago

Well GPU are just special purpous CPU.

bmc7505•1h ago

As foretold six years ago. [1]

[1]: https://breandan.net/2020/06/30/graph-computation#roadmap

nicman23•1h ago

can i run linux on a nvidia card though?

micw•51m ago

Linux runs everywhere

deep1283•54m ago

This is a fun idea. What surprised me is the inversion where MUL ends up faster than ADD because the neural LUT removes sequential dependency while the adder still needs prefix stages.

mrlonglong•28m ago

Now I've seen it all. Time to die.. (meant humourously)

MadnessASAP•22m ago

Ya know just today I was thinking around a way to compile a neural network down to assembly. Matching and replacing neural network structures with their closest machine code equivalent.

This is way cooler though! Instead of efficiently running a neural network on a CPU, I can inefficiently run my CPU on neural network! With the work being done to make more powerful GPUs and ASICs I bet in a few years I'll be able to run a 486 at 100MHz(!!) with power consumption just under a megawatt! The mind boggles at the sort of computations this will unlock!

Few more years and I'll even be able to realise the dream of self-hosting ChatGPT on my own neural network simulated CPU!

Motorola GrapheneOS devices will be bootloader unlockable/relockable

Better JIT for Postgres

TikTok will not introduce end-to-end encryption, saying it makes users less safe

Agentic Engineering Patterns

A CPU that runs entirely on GPU

Graphics Programming Resources

MacBook Pro with M5 Pro and M5 Max

On the Design of Programming Languages (1974) [pdf]

RFC 9849. TLS Encrypted Client Hello

Weave – A language aware merge algorithm based on entities

Claude's Cycles [pdf]

Speculative Speculative Decoding (SSD)

Voxile: A ray-traced game made in its own engine and programming language

You can use newline characters in URLs

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

Textadept

The largest acidic geyser has been putting on quite a show

Show HN: Rust compiler in PHP emitting x86-64 executables

My spicy take on vibe coding for PMs

Welcoming Elizabeth Barron as the New Executive Director of the PHP Foundation

Circle Games (2019)

When AI writes the software, who verifies it?

Number Research Inc

An Interactive Intro to CRDTs (2023)

California's Digital Age Assurance Act, and FOSS

Giving LLMs a personality is just good engineering

GPT‑5.3 Instant

Intel's make-or-break 18A process node debuts for data center with 288-core Xeon

Indefinite Book Club Hiatus

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

A CPU that runs entirely on GPU

Comments

Motorola GrapheneOS devices will be bootloader unlockable/relockable

Better JIT for Postgres

TikTok will not introduce end-to-end encryption, saying it makes users less safe

Agentic Engineering Patterns

A CPU that runs entirely on GPU

Graphics Programming Resources

MacBook Pro with M5 Pro and M5 Max

On the Design of Programming Languages (1974) [pdf]

RFC 9849. TLS Encrypted Client Hello

Weave – A language aware merge algorithm based on entities

Claude's Cycles [pdf]

Speculative Speculative Decoding (SSD)

Voxile: A ray-traced game made in its own engine and programming language

You can use newline characters in URLs

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

Textadept

The largest acidic geyser has been putting on quite a show

Show HN: Rust compiler in PHP emitting x86-64 executables

My spicy take on vibe coding for PMs

Welcoming Elizabeth Barron as the New Executive Director of the PHP Foundation

Circle Games (2019)

When AI writes the software, who verifies it?

Number Research Inc

An Interactive Intro to CRDTs (2023)

California's Digital Age Assurance Act, and FOSS

Giving LLMs a personality is just good engineering

GPT‑5.3 Instant

Intel's make-or-break 18A process node debuts for data center with 288-core Xeon

Indefinite Book Club Hiatus

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents