In practice, the performance impact of variable length encoding is largely kept in check using predictors. The extra complexity in terms of transistors is comparatively small in a large, high-performance design.
Related reading:
https://patents.google.com/patent/US6041405A/en
https://web.archive.org/web/20210709071934/https://www.anand...
>So fixed-length instructions seem really nice when you're building little baby computers, but if you're building a really big computer, to predict or to figure out where all the instructions are, it isn't dominating the die. So it doesn't matter that much.
Well, efficiency advantages are the domain of little baby computers. Better predictors give you deeper pipelines without stalls which give you higher clock speeds - higher wattages
It's a genuine question; I'm sure both factors make a difference but I don't know their relative importance.
> During active development with virtual machines running, a few calls, and an external keyboard and mouse attached, my laptop running Asahi Linux lasts about 5 hours before the battery drops to 10%. Under the same usage, macOS lasts a little more than 6.5 hours. Asahi Linux reports my battery health at 94%.
[0] https://blog.thecurlybraces.com/2024/10/running-fedora-asahi...
Look at the difference in energy usage between safari and chrome on M4s.
Steamdeck with Windows 11 and SteamOS is a whole different experience. When running SteamOS and doing web surfing, the fan don't even really spin at all. But when running windows 11 and do the exact same thing, it just spins all the time and becomes kinda hot.
The kernel would need to have a scheduler that knows it can't use those cores for certain tasks. Think about how hard you would have to work to even identify such a task ...
Create a new compilation target
- You'll probably just end up running a lot of current x86 code exclusively on performance cores to a net loss. This is how RISC-V deals with optional extensions.
Emulate
- This already happens for some instructions but, like above, could quickly negate the benefits
Ask for permission
- This is what AVX code does now, the onus is on the programmer to check if the optional instructions can be used. But you can't have many dropped instructions and expect anybody to use it.
Ask for forgiveness
- Run the code anyway and catch illegal instruction exceptions/signals, then move to a performance core. This would take some deep kernel surgery for support. If this happens remotely often it will stall everything and make your system hate you.
The last one raises the question: which instructions are we considering 'legacy'? You won't get far in an x86 binary before running into an instruction operating on memory that, in a RISC ISA, would mean first a load instruction, then the operation, then a store. Surely we can't drop those.
Intel dropped their x86-S proposal; but I guess something like that could work for low power cores. If you provide a way for a 64-bit OS to start application processors directly in 64-bit mode, you could setup low power cores so that they could only run in 64-bit mode. I'd be surprised if the juice is worth the squeeze, but it'd be reasonable --- it's pretty rare to be outside 64-bit mode, and systems that do run outside 64-bit mode probably don't need all the cores on a modern processor. If you're running in a 64-bit OS, it knows which processes are running in 32-bit mode, and could avoid scheduling them on reduced functionality cores; If you're running a 32-bit OS, somehow or another the OS needs to not use those cores... either the ACPI tables are different and they don't show up for 32-bit, init fails and the OS moves on, or the there is a firmware flag to hide them that must be set before running a 32-bit OS.
I think support in the OS/runtime environment* would be more interesting for chips where some cores have larger execution units such as those for vector and matmul units. Especially for embedded / low power systems.
Maybe x87/MMX could be dropped though.
*. BTW. If you want to find research papers on the topic, a good search term is "partial-ISA migration".
I am not a chip expert it’s just so night and day different using a Mac with an arm chip compared to an Intel one from thermals to performance and battery life and everything in between. Intel isn’t even in the same ballpark imo.
But competition is good and let’s hope they both do —- Intel and AMD because the consumer wins.
Genuinely asking -- what is it due to? Because like the person you're replying to, the m* processors are simply better: desktop-class perf on battery that hangs with chips with 250 watt TDP. I have to assume that amd and intel would like similar chips, so why don't they have them if not due to the instruction set? And AMD is using TSMC, so that can't be the difference.
- more advanced silicon architecture. Apple spends billions to get access to the latest generation a couple of years before AMD.
- world class team, with ~25 years of experience building high speed low power chips. (Apple bought PA Semi to make these chips, which was originally the team that build the DEC StrongARM). And then paid & treated them properly, unlike Intel & AMD
- a die budget to spend transistors for performance: the M chips are generally quite large compared to the competition
- ARM's weak memory model also helps, but it's very minor IMO compared to the above 3.
P.S. hearsay and speculation, not direct experience. I haven't worked at Apple and anybody who has is pretty closed lip. You have to read between the lines.
re: apple getting exclusive access to the best fab stuff: https://appleinsider.com/articles/23/08/07/apple-has-sweethe... . Interesting.
They historically haven't. They've wanted the higher single-core performance and frequency and they've pulled out all the stops to get it. Everything had been optimized for this. (Also, they underinvested in their uncores, the nastiest part of a modern processor. Part of the reason AMD is beating Intel right now despite being overall very similar is their more recent and more reliable uncore design.)
They are now realizing that this was, perhaps, a mistake.
AMD is only now in a position to afford to invest otherwise (they chose quite well among the options actually available to them, in my opinion), but Intel has no such excuse.
Thank you.
If you don't care to clock that high, you can reduce space and power requirements at all clocks; AMD does that for the Zen4c and Zen5c cores, but they don't (currently) ship an all compact core mobile processor. Apple can sell a Premium branded CPU where there's no option to burn a lot of power to get a little faster; but AMD and Intel just can't, people may say they want efficiency, but having higher clocks is what makes an x86 processor premium.
In addition to the basic efficiency improvements you get by having a clock limit, Apple also utilizes wider execution; they can run more things in parallel, this is enabled to some degree by the lower clock rates, but also by the commitment to higher memory bandwidth via on package memory; being able to count on higher bandwidth means you can expect to have more operations that are waiting on execution rather than waiting on memory, so wider execution has more benefits. IIRC, Intel released some chips with on package memory, but they can't easily just drop in a couple more integer units onto an existing core.
The weaker memory model of ARM does help as well. The M series chips have a much wider out of order window, because they don't need to spend as much effort on ordering constraints (except when running in the x86 support mode); this also helps justify wider execution, because they can keep those units busy.
I think these three things are listed in order of impact, but I'm just an armchair computer architecture philosopher.
Apple did a ton of work on the power efficiency of iOS on their own ARM chips for iPhone for a decade before introducing the M1.
Since iOS and macOS share the same code base (even when they were on different architectures) it makes much more sense to simplify to a single chip architecture that they already had major expertise with and total control over.
There would be little to no upside for cutting Intel in on it.
Apple Silicon macs are far less impressive if you came from an 8c/16t Ryzen 7 laptop. Especially if you consider the Apple parts are consistently enjoying the next best TSMC node vs. AMD (e.g. 5nm (M1) vs. 7nm (Zen2))
What's _really_ impressive is how badly Intel fell behind and TSMC has been absolutely killing it.
On the flip side, if you look at servers... Compare a 128+core AMD server CPU vs a large core ARM option and AMD perf/watt is much better.
I'd be happy to be corrected, but the empirical core counts seem to agree.
There are two entities allowed to make x86_64 chips (and that only because AMD won the 64 bit ISA competition, otherwise there'd be only Intel). They get to choose.
The rest will use arm because that's all they have access to.
Oh, and x86_64 will be as power efficient as arm when one of the two entities will stop competing on having larger numbers and actually worry about power management. Maybe provide a ?linux? optimized for power consumption.
I used laptops with both Intel and AMD CPUs, and I read/watch a lot of reviews in thin and light laptop space. Although AMD became more power efficient compared to Intel in the last few years, AMD alternative is only marginally more efficient (like 5-10%). And AMD is using TSMC fabs.
On the other hand Qualcomm's recent Snapdragon X series CPUs are significantly more efficient then both Intel and AMD in most tests while providing the same performance or sometimes even better performance.
Some people mention the efficiency gains on Intel Lunar Lake as evidence that x86 is just as efficient, but Lunar Lake was still slightly behind in battery life and performance, while using a newer TSMC process node compared to Snapdragon X series.
So, even though I see theoretical articles like this, the empirical evidence says otherwise. Qualcomm will release their second generation Snapdragon X series CPUs this month. My guess is that the performance/efficiency gap with Intel and AMD will get even bigger.
As all the fanbois in the thread have have pointed out, Apple's M series is fast and efficient compared to x86 for desktop/server workloads. What no one seems to acknowledge is that Apple's A series is also fast and efficient compared to other ARM implementations in mobile workloads. Apple sees the need to maintain M and A series CPUs for different workloads, which indicates there's a benefit to both.
This tells me the ISA decode hardware isn't or isn't the only bottleneck.
exmadscientist•3h ago
In the past x86 didn't dominate in low power because Intel had the resources to care but never did, and AMD never had the resources to try. Other companies stepped in to full that niche, and had to use other ISAs. (If they could have used x86 legally, they might well have done so. Oops?) That may well be changing. Or perhaps AMD will let x86 fade away.
mananaysiempre•2h ago
Remember Atom tablets (and how they sucked)?
YetAnotherNick•2h ago
Care to elaborate. I had the 9" mini laptop kind of device based on Atom and don't remember Atom to be the issue.
mananaysiempre•2h ago
However, what I meant is Atom-based Android tablets. At about the same time as the netbook craze (late 2000s to early 2010s) there was a non-negligible number of Android tablets, and a noticeable fraction of them was not ARM- but Atom-based. (The x86 target in the Android SDK wasn’t only there to support emulators, originally.) Yet that stopped pretty quickly, and my impression is that that happened because, while Intel would certainly have liked to hitch itself to the Android train, they just couldn’t get Atoms fast enough at equivalent power levels (either at all or quickly enough). Could have been something else, e.g. perhaps they didn’t have the expertise to build SoCs with radios?
Either way, it’s not that Intel didn’t want to get into consumer mobile devices, it’s that they tried and did not succeed.
toast0•30m ago
IMHO, If Intel had done another year or two of trying, it probably would have worked, but they gave up. They also canceled x86 for phone like the day before the Windows Mobile Continuum demo, which would have been a potentially much more compelling product with x86, especially if Microsoft allowed running win32 apps (which they probably wouldn't, but the potential would be interesting)
cptskippy•2h ago
Atom wasn't about power efficiency or performance, it was about cost optimization.
Findecanor•2h ago
I have a ten-year old Lenovo Yoga Tab 2 8" Windows tablet, which I still use at least once every week. It is still useful. Who can say that they are still using a ten-year old Android tablet?
criticalfault•2h ago
wmf•2h ago
saltcured•2h ago
kccqzy•2h ago
mmis1000•2h ago
(Also probably due to it is a tablet, so it have a reasonable fast storage instead of hdds like notebooks in that era)
wlesieutre•2h ago
yndoendo•2h ago
DoD originally required all products to be sourced by at least three companies to prevent supply chain issues. This required Intel to allow AMD and VIA to produce products based on ISA.
For me this is good indicator if someone that talks about good national security knows what they are talking about or are just spewing bullshit and playing national security theatre.
pavlov•2h ago
torginus•2h ago
https://web.archive.org/web/20210622080634/https://www.anand...
Basically the gist of it is that the difference between ARM/x86 mostly boils down to instruction decode, and:
- Most instructions end up being simple load/store/conditional branch etc. on both architectures, where there's literally no difference in encoding efficiency
- Variable length instruction has pretty much been figured out on x86 that it's no longer a bottleneck
Also my personal addendum is that today's Intel efficiency cores are have more transistors and better perf than the big Intel cores of a decade ago
mort96•2h ago
I imagine that the difference is much greater for the tiny in-order CPUs we find in MCUs though, just because an amd64 decoder would be a comparatively much larger fraction of the transistor budget
whynotminot•2h ago
No, not really. The advantage is Apple prioritizing efficiency, something Intel never cared enough about.