RISC-V Is Sloooow

https://marcin.juszkiewicz.com.pl/2026/03/10/risc-v-is-sloooow/

70•todsacerdoti•1h ago

Comments

rbanffy•1h ago

Don't blame the ISA - blame the silicon implementations AND the software with no architecture-specific optimisations.

RISC-V will get there, eventually.

I remember that ARM started as a speed demon with conscious power consumption, then was surpassed by x86s and PPCs on desktops and moved to embedded, where it shone by being very frugal with power, only to now be leaving the embedded space with implementations optimised for speed more than power.

dmitrygr•1h ago

IF you care to read the article, they indeed do not blame the architecture but the available silicon implementations.

rbanffy•1h ago

I did read it. A Banana Pi is not the fastest developer platform. The title is misleading.

BTW, it's quite impressive how the s390x is so fast per core compared to the others. I mean, of course it's fast - we all knew that.

And don't let IBM legal see this can be considered a published benchmark, because they are very shy about s390x performance numbers.

menaerus•1h ago

Which risc-v implementation is considered fast?

patchnull•57m ago

Nothing shipping today is really competitive with modern ARM or x86. The SiFive P870 and Tenstorrent Ascalon (Jim Keller's team) are the most anticipated high-performance designs, but neither is widely available. What you can actually buy today tops out around Cortex-A76 class single-thread performance at best, which is roughly where ARM was five or six years ago.

menaerus•39m ago

I remember taking down some notes wrt SiFive P870 specs, comparing them to x86_64, and reaching the same conclusion. Narrower core width (4-wide vs 8-wide), lower clock frequency (peaks at 3GHz) and no turbo (?), limited support for vector execution (128-bit vs 512-bit), limited L1 bandwidth (1x 128-bit load/cycle?), limited FP compute (2x 128-bit vs 2x 512-bit), load queue is also inconveniently small with 48 entries (affecting already limited load bandwidth), unclear system memory bandwidth and how it scales wrt the number of cores (L3 contention) although for the latter they seem to use what AMD is doing (exclusive L3 cache per chiplet).

NooneAtAll3•34m ago

DC-ROMA 2 is on the Rasperry 4 level of performance last I heard

gt0•1h ago

I was really surprised by the s390x performance, but I also don't really understand why there are build time listed by architecture, not the actual processors.

rbanffy•53m ago

Probably because that's just the infrastructure they have.

pantalaimon•2m ago

i686 builds even faster

Aurornis•34m ago

> A Banana Pi is not the fastest developer platform.

What is the current fastest platform that isn’t exorbitantly expensive? Not upcoming releases, but something I can actually buy.

I check in every 3-6 months but the situation hasn’t changed significantly yet.

cestith•25m ago

What is the current fastest ppc64le implementation that isn’t exorbitantly expensive? How about the s390x?

tromp•1h ago

But they didn't reflect that in a title like "current RISC-V silicon Is Sloooow" ...

topspin•1h ago

I keep checking in on Tenstorrent every few months thinking Keller is going to rock our world... losing hope.

At this point the most likely place for truly competitive RISC-V to appear is China.

rbanffy•1h ago

> At this point the most likely place for fast RISC-V to appear is China.

Or we just adopt Loongson.

balou23•56m ago

TBH I still don't really get how it's different from MIPS. As far as I can tell... Loongson seems to be really just MIPS, while LoongArch is MIPS with some extra instructions.

mananaysiempre•48m ago

But legally distinct! I guess calling it Ｍ○ＰＳ was not enough for plausible deniability.

genxy•30m ago

ISAs shouldn't be patentable in the first place.

pantalaimon•6m ago

They did get rid of the delay slots and some other MIPS oddities

throawayonthe•8m ago

(purely on vibes) loongson feels to me like an intermediate step/backup strategy rather than a longterm target (though they'll probably power govt equipment for decades of legacy either way :p)

spiderice•1h ago

Then how do you justify the title?

api•1h ago

A pattern I've noticed for a very long time:

A lot of times the path to the highest performing CPU seems to be to optimize for power first, then speed, then repeat. That's because power and heat are a major design constraint that limits speed.

I first noticed this way back with the Pentium 4 "Netburst" architecture vs. the smaller x86 cores that became the ancestor of the Core architecture. Intel eventually ran into a wall with P4 and then branched high performance cores off those lower-power ones and that's what gave us the venerable Core architecture that made Intel the dominant CPU maker for over a decade.

ARM's history is another example.

jnovek•54m ago

I don’t have a micro architecture background so I apologize if this is obvious — What do power and speed mean in this context?

McP•44m ago

Power - how many Watts does it need? Speed - how quickly can it perform operations?

unethical_ban•32m ago

One could say "Optimize for efficiency first, then performance".

jauntywundrkind•24m ago

Parallels to code design, where optimizing data or code size can end up having fantastic performance benefits (sometimes).

cptskippy•21m ago

Core evolved from the Banis (Centrino) CPU core which was based on P3, not P4. Banias used the front-side bus from P4 but not the cores.

Banias was hyper optimized for power, the mantra was to get done quickly and go to sleep to save power. Somewhere along the line someone said "hey what happens if we don't go to sleep?" and Core was born.

cpgxiii•11m ago

I think the story is a bit more complicated. Core succeeded precisely because Intel had both the low-power experience with Pentium-M and the high-power experience with Netburst. The P4 architecture told them a lot about what was and wasn't viable and at what complexity. When you look at the successor generations from Core, what you see are a lot of more complex P4-like features being re-added, but with the benefits of improved microarch and fab processes. Obviously we will never know, but I don't think you would get to Haswell or Skylake in the form they were without the learning experience of the P4.

In comparison, I think Arm is actually a very strong cautionary tale that focusing on power will not get you to performance. Arm processors remained pretty poor performance until designers from other CPU families entirely (PowerPC and Intel) took it on at Apple and basically dragged Arm to the performance level they are today.

rwmj•35m ago

Marcin is working with us on RISC-V enablement for Fedora and RHEL, he's well aware of the problem with current implementations. We're hopeful that this'll be pretty much resolved by the end of the year.

cogman10•31m ago

> AND the software with no architecture-specific optimisations

The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V. I do not believe this is a lack of software optimization issue.

We are well past the days where hand written assembly gives much benefit, and modern compilers like gcc and llvm do nearly identical work right up until it comes to instruction emissions (including determining where SIMD instructions could be placed).

Unless these chips have very very weird performance characteristics (like the weirdness around x86's lea instruction being used for arithmetic) there's just not going to be a lot of missed heuristics.

hrmtst93837•20m ago

One thing compilers still struggle with is exploiting weird microarchitectural quirks or timing behaviors that aren't obvious from the ISA spec, especially with memory, cache and pipeline tuning. If a new RISC-V core doesn't expose the same prefetching tricks or has odd branch prediction you won't get parity just by porting the same backend. If you want peak numbers sometimes you do still need to tune libraries or even sprinkle in a bit of inline asm despite all the "let the compiler handle it" dogma.

bobmcnamara•13m ago

> The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V.

There's no carry bit, and no widening multiply(or MAC)

fidotron•30m ago

> RISC-V will get there, eventually.

Not trolling: I legitimately don't see why this is assumed to be true. It is one of those things that is true only once it has been achieved. Otherwise we would be able to create super high performance Sparc or SuperH processors, and we don't.

As you note, Arm once was fast, then slow, then fast. RISC-V has never actually been fast. It has enabled surprisingly good implementations by small numbers of people, but competing at the high end (mobile, desktop or server) it is not.

rwmj•23m ago

RISC-V doesn't have the pitfalls of Sparc (register windows, branch delay slots), largely because we learned from that. It's in fact a very "boring" architecture. There's no one that expects it'll be hard to optimize for. There are at least 2 designs that have taped out in small runs and have high end performance.

fidotron•18m ago

> RISC-V doesn't have the pitfalls of Sparc (register windows, branch delay slots),

You're saying ISA design does have implementation performance implications then? ;)

> There's no one that expects it'll be hard to optimize for

[Raises hand]

> There are at least 2 designs that have taped out in small runs and have high end performance.

Are these public?

Edit: I should add, I'm well aware of the cultural mismatch between HN and the semi industry, and have been caught in it more than a few times, but I also know the semi industry well enough to not trust anything they say. (Everything from well meaning but optimistic through to outright malicious depending on the company).

gt0•21m ago

I don't think anybody suggests Oracle couldn't make faster SPARC processors, it's just that development of SPARC ended almost 10 years ago. At the time SPARC was abandoned, it was very competitive.

newpavlov•23m ago

In some cases RISC-V ISA spec is definitely the one to blame:

1) https://github.com/llvm/llvm-project/issues/150263

2) https://github.com/llvm/llvm-project/issues/141488

Another example is hard-coded 4 KiB page size which effectively kneecaps ISA when compared against ARM.

adastra22•18m ago

Also the bit manipulation extension wasn't part of the core. So things like bit rotation is slow for no good reason, if you want portable code. Why? Who knows.

fidotron•14m ago

The fact the Hazard3 designer ended up creating an extension to resolve related oddities was kind of astonishing.

Why did it fall to them to do it? Impressive that he did, but it shouldn't have been necessary.

Dwedit•14m ago

There's the ARM video from LowSpecGamer, where they talk about how they forgot to connect power to the chip, and it was still executing code anyway. According to Steve Furber, the chip was accidentally being powered from the protection diodes alone. So ARM was incredibly power efficient from the very beginning.

leni536•1h ago

Is cross compilation out of the question?

IshKebab•1h ago

It's usually an enormous pain to set up. QEMU is probably the best option.

sofixa•38m ago

Depends on the language, it's pretty trivial with Go.

Zambyte•16m ago

Unless you use CGO. I've heard people using Zig (which has great cross compilation for the Zig language as well) to cross compile C with CGO though.

STKFLT•14m ago

Maybe there are issues I'm not aware of but using dockcross has made cross-compilation quite easy in my experience.

https://github.com/dockcross/dockcross

pantalaimon•5m ago

T2 manages to do it

https://t2linux.com/

STKFLT•19m ago

I'd guess that the issue is running the `%install` and `%check` stages of the .spec file. The Python library rpy (to pull a random example from Marcin's PRs) runs rpy's pytest test suite and had to be modified to avoid running vector tests on RISC-V.

Obviously a solvable problem to split build and test but perhaps the time savings aren't worth the complexity.

https://src.fedoraproject.org/rpms/rpy/pull-request/4#reques...

lifis•1h ago

Or they could fix cross compilation and then compile it on a normal x86_64 server

IshKebab•1h ago

Yeah it's a few years behind ARM, but not that many. Imagine trying to compile this on ARM 10 years ago. It would be similarly painful.

hackerInnen•44m ago

This. While I doubt that there will be a good (whatever that means) desktop risc-v CPU anytime soon, I do think that it will eventually catch up in embedded systems and special applications. Maybe even high core count servers.

It just takes time, people who believe in it and tons of money. Will see where the journey goes, but I am a big risc-v believer

kllrnohj•14m ago

> Imagine trying to compile this on ARM 10 years ago

Cortex A57 is 14 years old and is significantly faster than the 9 year old Cortex A55 these RISC-V cores are being compared against.

So yes it's many years behind. Many, many years.

rbalint•59m ago

If the builds are slow, build accelerators can help a lot. Ccache would work for sure and there is also firebuild, that can accelerate the linker phase and many other tools in builds.

brcmthrowaway•54m ago

Why is it slow? I thought we have Rivos chips

Joel_Mckay•53m ago

Any new hardware lags in compiler optimizations.

i. llvm presentation can thrash caches if setup wrong (given the plethora of RISC-V fragmented versions, most compilers won't cover every vanity silicon.)

ii. gcc is also "slow" in general, but is predictable/reliable

iii. emulation is always slower than kvm in qemu

It may seem silly, but I'd try a gcc build with -O0 flag, and a toy unit test with -S to see if the ASM is actually foobar. One may have to brute force the optimizer flags to narrow your search. Best regards =3

Levitating•51m ago

This is why felix has been building the risc-v archlinux repositories[1] using the Milk-V Pioneer.

I think the ban of SOPHGO is part to blame for the slow development.[2] They had the most performant and interesting SOCs. I had a bunch of pre-orders for the Milk-V Oasis before it was cancelled. It was supposed to come out a while ago, using the SG2380, supposedly much more performant than the Milk-V Titan mentioned in the article (which still isn't out).

It was also SOPHGO's SOCs that powered the crazy cheap/performant/versatile Milk-V DUO boards. They have the ability to switch ARM/RISC-V architecture.

[1]: https://archriscv.felixc.at/

[2]: https://www.tomshardware.com/tech-industry/artificial-intell...

andrepd•36m ago

There's zero mention of hardware specs or cost beyond architecture and core counts... What is the purpose of this post?

Anyway, it's hardly surprising that a young ISA with not a 1/1000th of the investment of x86 or ARM has slower chips than them x)

theodric•7m ago

Careful posting this. There's an HN denizen who is a massive RISC-V apologist and gets very triggered by anyone suggesting that it isn't the best, hottest, most performant ISA out there.

sltkr•5m ago

Are you sure you are comparing apples with apples here?

The fact that i686 is 14% faster than x86_64 is a little suspicious, because usually the same software runs _faster_ on x86_64 (despite the increased memory use) thanks to a larger register set, an optimized ABI, and more vector instructions.

Of course, if you are compiling an i686 binary on i686, and an x86_64 binary on x86_64, then the compilers aren't really doing the same work, since their output is different. I'm not a compiler expert, but I could imagine that compiling x86_64 binaries is intrinsically slower than for i686 for a variety of reasons. For example, x86_64 is mostly a superset of i686, so a compiler has way more instructions to consider, including potential optimizations using e.g. SIMD instructions that don't exist on i686 at all. Or a compiler might assume a larger instruction cache size, by default, and do more unrolling or inlining when compiling for x86_64. And so on.

In that case, compiling on x86_64 is slower not because the hardware is bad but because the compiler does more work. Perhaps something similar is happening on RISC-V.

Codeown – A platform for developers to document their building journey

Apologies for earlier submissions, still learning the rules

Open-source DCF engine based on Damodaran's datasets with LLM narratives

OLAP Is All You Need: How We Built Reddit's Logging Platform

Sensational news. Fintech has published banking secrets for public access

Too much color: how many decimal places do you need?

Don't worry, Valve still plans to launch the Steam Machine "this year"

Permission denied:Help stop Google's attack on free and open Android development

Spectrogram Text Art with MiniDSP

AI unlocking new treatments for 'incurable' diseases

Ask HN: What are some good AI usage policies?

Miguel: An AI agent that modifies its own source code, sandboxed in Docker

If the differentiation is domain and GTM?

TLA+ as a Design Accelerator: Lessons from the Industry

Tells your next sick day ('Sick Clock') [video]

Clock – Variable Font

Trion: A Behavioral Oracle That Derived Truth from On-Chain History Not Price

Show HN: Rampart – Open-source firewall for AI agents (v0.8)

Markdown Files Won't Make You an Engineer

Build to Capture, Not to Last

Testing Nvidia's FP4: Running 70B LLMs on a Single RTX 5090 with Real Benchmarks

U.S. DOJ Attorney: I used AI to try and replicate my prior [deleted] work

Is Spotify Enabling Impersonation of Famous Jazz Musicians?

Tell HN: Beware of Mac Studio Scams on eBay

What's My ΔE(OK) JND?

Punching, Slamming, Screaming: A Chef's Past Abuse Haunts Noma

Evidence That Managerial Tone Predicts Returns When Text Does Not

YouTube is now the largest media company

The Linux Foundation Certificate of Origin Is recursive

Show HN: Draxl, agent-native source code with stable AST node IDs

Codeown – A platform for developers to document their building journey

Apologies for earlier submissions, still learning the rules

Open-source DCF engine based on Damodaran's datasets with LLM narratives

OLAP Is All You Need: How We Built Reddit's Logging Platform

Sensational news. Fintech has published banking secrets for public access

Too much color: how many decimal places do you need?

Don't worry, Valve still plans to launch the Steam Machine "this year"

Permission denied:Help stop Google's attack on free and open Android development

Spectrogram Text Art with MiniDSP

AI unlocking new treatments for 'incurable' diseases

Ask HN: What are some good AI usage policies?

Miguel: An AI agent that modifies its own source code, sandboxed in Docker

If the differentiation is domain and GTM?

TLA+ as a Design Accelerator: Lessons from the Industry

Tells your next sick day ('Sick Clock') [video]

Clock – Variable Font

Trion: A Behavioral Oracle That Derived Truth from On-Chain History Not Price

Show HN: Rampart – Open-source firewall for AI agents (v0.8)

Markdown Files Won't Make You an Engineer

Build to Capture, Not to Last

Testing Nvidia's FP4: Running 70B LLMs on a Single RTX 5090 with Real Benchmarks

U.S. DOJ Attorney: I used AI to try and replicate my prior [deleted] work

Is Spotify Enabling Impersonation of Famous Jazz Musicians?

Tell HN: Beware of Mac Studio Scams on eBay

What's My ΔE(OK) JND?

Punching, Slamming, Screaming: A Chef's Past Abuse Haunts Noma

Evidence That Managerial Tone Predicts Returns When Text Does Not

YouTube is now the largest media company

The Linux Foundation Certificate of Origin Is recursive

Show HN: Draxl, agent-native source code with stable AST node IDs

RISC-V Is Sloooow

Comments