> McAfee Corp. ... Intel Security Group from 2014 to 2017
Still, I agree. The 68K workstation was essentially obsolete by the time NeXT shipped. Sun was shipping early Sparc systems around the same time. The writing was on the wall. No wonder they didn't stick with their own hardware for very long.
Honestly, Motorola is entirely to blame for losing out on the workstation market. They iterated too slowly and never took Intel seriously enough. I say this as I wistfully eyeball the lonely 68060 CPU I have sitting on my desk for a future project...
Yeah, it seems Motorola lost their lead with the 68040. Intel was getting huge clock speed gains with the later 486/DX2, DX4, etc. From what I recall, a similarly clocked 040 was faster than a 486 on most benchmarks, but there was simply no way to compete with Intel's high clocks.
The NeXT hardware was massively under powered for the software it ran. Other major workstation vendors like Sun were already moving to their own RISC hardware.
I don't mean to look down on this kind of group, I am probably one of them. There is nothing wrong with people enjoying a good work life balance at a decent paying job. However, I think there is a reality that if one wants a world-best company creating world-best products this is simply not good enough. Just like a team of weekend warriors would not be able to win the Superbowl (or even ever make it anywhere close to a NFL team) - which is perfectly fine! - the same way it's not fair to expect an average organization to perform world champion feats.
Organisations fail when the ‘business’ people take over. People who let short term money-thinking make the decisions, instead good taste, vision or judgement.
Think Intel when they turned down making the iPhone chips because they didn’t think it’d be profitable enough, or Google’s head of advertising (same guy who killed yahoo search) degrading search results to improve ad revenue.
Apple have been remarkably immune to it post-Jobs, but it’s clear that’s on the way out with the recent revelations about in-app purchases.
The innovators in the company are likely correlated with doing more than 9-5. These people get frustrated that their ideas no longer get traction and leave the company.
Eventually what's left are the people happy to just deliver what their told without much extra thought. These people are probably more likely to just clock in the hours. Any remaining innovators now have another reason to become even more frustrated and leave.
Companies die when the sort of managers take over who see their job as to manage, taking pride in not knowing about the product or customers, instead of caring deeply about delivering a good product. The company may continue for years afterwards, but it’s a zombie, decaying from the inside.
Maximally efficient is minimally robust.
Squeezing every penny out of something means optimizing perfectly for present conditions--no more, no less. As long as those conditions shift slowly, slight adjustments work.
If those conditions shift suddenly though, death ensues.
I got the original iPad as a graduation present and as futuristic as it was ended quickly lost its lustre for me thanks to Apple's walled garden.
Took a few more years until I was rocking Debian via Crostini on the first Samsung ARM Chromebook to scratch that low cost Linux ultraportable itch again (with about triple the battery life and a third as thick as a bonus).
I don’t recall if there was ever a difference between “abort” and “fail.” I could choose to abort the operation, or tell it … to fail? That this is a failure?
¯\_(ツ)_/¯
Abort would cancel the entire file read.
Retry would attempt that sector again.
Fail would fail that sector, but the program might decide to keep trying to read the rest of the file.
In practice abort and fail were often the same.
Sorry for the confusion.
I'd love some discussion on why Intel left XScale and went to Atom and i think Itanium is worthy of discussion in this era too. I don't really want a raw listing of [In year X Intel launched Y with SPEC_SHEET_LISTING features].
I thought it was pretty obvious. They didn't control the ARM ISA and ARM Ltd designs had caught up to + surpassed XScale innovations (superscalar, Out-of-order pipelining, MIPS/w, etc). So instead of further innovating they decided to launch a competitor of their own ISA.
IMO, Intel took us from common, affordable CPUs to high-priced, "Intel-only" CPUs. It was originally designed to use Rambus RAM, and it turned out Intel had a stake in that company. Intel got greedy and tried to force the market to go the way it wanted.
Honestly, AMD saved the x86 market for us common folks. Their approach of extending x86 to 64-bit and adopting DDR RAM allowed for the continuation of affordable, mainstream CPUs. This enabled companies to buy tons of servers for cheap.
Intel’s u-turn on x86-64 shows even they knew they couldn’t win.
AMD has saved Intel’s x86 platform more than once. The market wants a common, gradual upgrade path for the PC platform not a sudden, expensive, single-vendor ecosystem.
Itanium was a massive technical failure but a massiver business success.
Intel spent a gigabuck and drove every single non-x86 competitor out of the server business with the exception of IBM.
About a year ago I looked into what practical benefits I'd gain if I upgraded the CPU and mobo to a more recent (but still used) spec from eBay. Using it mainly for retro game emulation and virtual pinball, I assessed single core performance and no CPU/mobo upgrade looked potentially compelling in real-world performance until at least 2020-ish - which is pretty crazy. Even then, one of the primary benefits would be access to NVME drives. It reminded me how much Intel under-performed and, more broadly, how the end of Moore's Law and Dennard Scaling combined around roughly 2010-ish to end the 30+ year 'Golden Era' of scaling that gave us computers which often roughly doubled performance across a broad range of applications which you could feel in everyday use - AND at >30% lower price - every three years or so.
Nowadays 8% to 15% performance uplift across mainstream applications at the same price is considered good and people are delighted if the performance is >15% OR if the price for the same performance drops >15%. If a generation delivers both >15% performance AND >15% lower price it would be stop-the-presses newsworthy. Kind of sad how our far our expectations have fallen compared to 1995-2005 when >30% perf at <30% price was considered baseline and >50% at <50% price was good and ~double perf at around half price was "great deal, time to upgrade again boys!".
Intel had constantly try to bring in visionaries, but failed over and over. With the exception of Jim Keller, Intel was duped into believing in incompetent people. At a critical juncture during the smart-phone revolution it was Mike Bell, a full-on Mr. Magoo. He never did anything after his stint with Intel worth mentioning - he was exposed as a pretender. Eric Kim would be another. Murthy Renduchintala is another. It goes on and on. Also critical was the the failure of an in-house exec named Anand Chandrasekher who completely flubbed the mega-project coop between Intel and Nokia to bring about Moblin OS and create a third phone ecosystem to the marketplace. WHY would Anand be put in charge of such an important effort?????? In Intel's defense, this project was submarined by Nokia's Stephen Elop, who usurped their CEO and left Intel standing at the altar. (Elop was a former Microsoft exec, Microsoft was also working on their foray into smartphones at the time. . very suspicious). XScale was mis-handled, Intel had a working phone with XScale prior to the iPhone being release .. but Intel was afraid of fostering a development community outside of x86 (Balmer once chanted -> developer, developer, developer). My guess is that ultimately, Intel suffers from the Kodak conundrum, i.e. they have probably rejected true visionaries because their ideas would always threaten the sacred cash cows. They have been afraid to innovate at the expense of profit margins (short term thinkers).
Smacks of financialization and wall-street centric managerial groupthink, rather than having the talented engineers to fight the coming mobile wars which were already very very apparent (thus the Atom), or even the current war of failure in discrete graphics.
Once the MBAs gain control of a dynamic technology company (I saw it at Medtronic personally), the technology and talent soul of the company is on a ticking timer of death. Medtronic turned into a acquire-tech-and-products-via buyout/acquisition rather than in-house, and Intel was also a treadmill of acquire-destroy (at least from my perspective Medtronic sometimes acquired companies and they became successful product lines, but Intel always seemed clueless in executing their acquisitions.
I look at all the 2000s acquisitions of Intel: sure shows they were "trying" at mobile, in the "signal wall street we are trying by acquiring companies so we keep our executive positions" but zero about actually chasing what mobile needed: low power, high performance.
He was a joke at Qualcomm before he went to Intel too. That Intel considered snagging him a coup was a consistent source of amusement.
It's classic Christiansen "Innovator's Dilemma" disruption. Market leading incumbents run by business managers won't assess emerging unproven new opportunities as being worth serious sustained investment compared to the existing categories they're currently dominating.
So you're saying that they were somehow unaware of how new BIOS implementations were used in CP/M to port it to new systems?
And that they distributed the BIOS source code with every IBM PC... to make it harder for competitors to build compatible machines due to copyright claims?
And that they were somehow unaware DOS had largely reimplemented the CP/M design and API? (Though DOS's FAT filesystem was a successor to FAT8 from Microsoft's Disk BASIC rather than CP/M's filesystem.)
Yes, they eventually got it right post-Newton after spending a lot of time on the outside of the sp500.
You could be the greatest business leader in history but you cannot save Intel without making most of the company hate you, so it will not happen. Just look at the blame game being played in these threads where somehow it's always the fault of these newly found to be inept individuals, and never the blundering morass of the bureaucratic whole.
ah ... thank you!!
i was searching for exactly the interview with jim keller, which is referenced in the chips&cheese article.
again: thanks for posting the article ... i didn't remember, it was anandtech ;))
"[Arguing about instruction sets] is a very sad story."
* https://www.anandtech.com/show/16762/an-anandtech-interview-...
cheers a..z
This is deep. It also highlight why it is easier to hire somebody outside of the company rather than promoting from within.
IMO, Intel (and AMD) did prove the impact of a legacy ISA was low enough to not be a competitive disadvantage. Not zero, but close enough for high-performance designs.
In fact, I actually think the need to continue supporting the legacy x86 ISA was a massive advantage to Intel. It forced them to go down the path of massively out-of-order μarches at a point in history where everyone else was reaping massive gains from following the RISC design philosophy.
If abstracting away the legacy ISA was all the massive out-of-order buffers did, then they would be considered to be nothing more than overhead. But the out-of-order μarch also had a secondary benefit of hiding memory latency, which was starting to become a massive issue at this point in history. The performance gains from this memory latency hiding were so much higher than the losses from translating x86 instructions, which allowed Intel/AMD x86 cores to dominate in the server, workstation and consumer computing markets in the late 90s and 2000s, killing off almost every competing RISC design (including Intel's own Itanium).
RISC designs only really held onto the low power markets (PDAs, cellphones), where simplicity and low power consumption still dominated the considerations.
------------------
What Intel might have missed is that x86 didn't hold a monopoly on massively out-of-order μarch. There was no reason you couldn't make a massively out-of-order μarch for a RISC ISA too.
And that's what eventually happened. Starting in the mid 2010s. We started seeing ARM μarchs (especially from Apple) that looked suspiciously like Intel/AMDs designs, just with much simpler frontends. They could get the best of both worlds, taking advantage of simpler instruction decoding, while still getting the advantages of being massively out-of-order.
------------------
You are right about Intel's arrogance, especially assuming they could keep a process lead. But the "x86 tax" really isn't that high. It's worth noting that one of the CPUs they are losing ground too, is also x86.
I think this is a myth that Intel (or somebody else) has invented in an attempt to save face. Legacy x86 instructions could have been culled from the silicon and implemented in the software as emulation traps – this has been done elsewhere nearly since the first revised CPU design came out. Since CPU's have been getting faster and faster, and legacy instructions have been used less and less, the emulation overhead would have been negligible to the point that no one would even notice it.
We are talking about how the whole ISA is legacy. How the basic structure of the encoding is complex and hard to decode. How the newer instructions get longer encodings. Or things which can be done with a single instruction on some RISC ISAs take 3 or 4 instructions.
x86 is not a legacy, it is a legacy of legacies as the x86 ISA ascends all the way to 8008 via 8080 at least as a spiritual predecessor even if it can't directly execute the 8008 binary code.
Intel also had their own, indigeneous, RISC design – i960, which was a very good RISC design. At some point – if I am not mistaken – Intel contemplated phasing out the x86 ISA and replacing it with i960, but there was a change of plans, and they went all in with the 80486 CPU. i960 remained around in embedded and defence applications.
Intel also had their own hybrid VLIW/RISC design, i860, which preceded Itanium, and which they did not know what to do. Similarly, they faced the same issue with compilers of the day not being to produce fast code.
I don't think there was ever any serious thought about replacing x86 with i960 (at least nothing publicly). There was a serious plan to replace x86 with the iAPX 432, which is the predecessor to the i960, but those plans all predated x86 becoming a run-away success when the IBM PC became an industry standard. And "replace" is kind of over stating it, there was no plan for any compatibility, not even source compatibility. It's more that they were planning for iAPX 432 to take the "workstation CPU" spot on their product chart that was currently occupied by the 8086.
By the time the i960 was in development, x86 was so entrenched that I really doubt there could have been any serious thoughts of replacing x86 with something that wasn't fully backwards compatible.
And we know that when Intel did try to replace x86 with Itanium, they went with a hardware backwards compatibility mode.
> I was actually commenting on the need to support aspect being a myth.
Yes, you have a point that it should have been possible to replace x86 with a software emulation approach.
But the only person who can really do that is the platform owner. Apple were quite successful with their 68k to PowerPC transition. And the PowerPC to x86 transition. And the x86 to Aarch64 transition. But that transition really needs to be done by the platform owner.
But the PC didn't really have an owner. IBM had lost control of it. You could argue that Microsoft had control, but they didn't have enough control (especially with DOS. Most DOS programs were bypassing DOS to some extent or another and directly accessed hardware). Intel certainly didn't have enough control to transition to another arch.
(The PC had such high demands for backwards compatibility that even the Pentium Pro ran into issue. It worked, but it simply wasn't fast enough when executing instructions with 16-bit operands. Attempting to run DOS or win95 apps would be slower than a 486. So the Pentium Pro was limited to the market of Windows NT workstations running Apps with full 32-bit code. Intel had to fix this with the Pentium II before they could sell the P6 arch outside of the workstation market.)
Intel didn't even have control over x86 itself. Other companies were already making competing CPU designs that were faster than Intel's own. If Intel didn't keep releasing faster x86 designs, then someone else would steal all of their market share. Intel were more or less forced to keep releasing faster native x86 designs, or they would lose what little control they did have.
«At the time, the 386 team felt that they were treated as the "stepchild" while the P7 [80960] project was the focus of Intel's attention. This would change as the sales of x86-based personal computers climbed and money poured into Intel. The 386 team would soon transform from stepchild to king».
And, yes, the histories of iAPX 432 and 80960 are so closely intertwined, that in many ways the 960 can be considered a design successor of the 432.
> But the only person who can really do that is the platform owner. Apple were quite successful with their 68k to PowerPC transition. And the PowerPC to x86 transition. And the x86 to Aarch64 transition. But that transition really needs to be done by the platform owner.
I wholeheartedly and vehemently agree with you on this – full platform ownership and the control of the entire vertical is key to being able to successfully execute an ISA transition. Another success story is, of course, IBM with iSeries (nèe AS/400) and zSeries (nèe 360/370/390), albeit their approach is rather different.
[0] https://www.righto.com/2023/07/the-complex-history-of-intel-...
That doesn't really suggest an intention to replace. To me that seems more of a hope that x86 would fade into irrelevance on its own, beaten down by superior RISC ISAs.
--------------
It is interesting to consider what a transition away from x86 would have looked like.
I think the best chance would have been something lead by Microsoft in the early 90s. The 386 version of Windows 3.0 was already virtualising both DOS and Win16 code into their own isolated VMs. If you added a translation layer for 16-bit x86 code to those VMs, then you could probably port windows to any host CPU arch.
I think we are talking about a world where 486 class CPUs never arrived, or they preformed horribly and the pentium was canceled.
But it's a small window. In 1990, it was very rare to see 32bit x86 code. 32-bit DOS extenders were only just starting to be a thing. Windows didn't support 32-bit userspace until 1993. The main 32-bit code anyone was running in 1990 was the windows 3.0 kernel itself. By 1992, it was common for DOS games to use DOS Extenders, and the transition would have required a 32-bit x86 translation layer too.
These RISC PC compatibles would lost the ability to boot directly into real-mode DOS, but would have run DOS just fine inside a windows DOS VM.
It should have been possible to get good hardware compatibility too. Windows 3.0 can already run DOS drivers inside a DOS VM, adding cpu translation shouldn't have caused issues. With motherboard support, it should have been possible to support most existing ISA/EISA/VLB cards.
And Apple proved that in fact it was a significant problem once you factored into account performance per watt allowing them to completely spank AMD and Intel once those hit a thermal limit. There’s a benefit from being able to decode and dispatch multiple instructions in parallel vs having to emulate that through heuristically guessing at instruction boundaries and backtrack when you make a mistake (among other things).
Intel/AMD don't use heuristics-based decoding, or backtracking. They can decode 4 instructions in a single cycle. They implement this by starting a pre-decode at every single byte offset (within 16 bytes) and then resolving it to actual instructions at the end of the cycle.
The actual decode is then done the following cycle, but the pre-decoder has already moved upto 4 instructions forwards, so the whole pipelined decoder can maintain 4 instructions per cycle on some code.
This pre-decode approach does have limits. Due to propagation delays, 4 instructions over 16 bytes is probably the realistic limit that you can push it (while Apple can easily do 8 instructions over 32 bytes). Intel's Golden Cove did finally push it to 6 instructions over 32 bytes, but I'm not sure that's worth it.
Intel's Skymont shows the way forwards. It only uses 3-wide decoders, but it has three of them running in parallel, leapfrogging over each other. They use the branch predictor to start each decoder running at a future instruction boundaries (inserting dummy branches to break up large branchless blocks). Skymont can maintain 9 instructions per cycle, which is more than the 8-wide that Apple currently is using. And unlike the previous "parallel pre-decode in a single cycle", this approach is scalable. Nothing stopping Intel adding a fourth decoder for 12 instructions per cycle, or a fifth decoder for 15. AMD is showing signs of going down the same path, zen5 has two 4-wide decoders though they can't work on the same thread, yet.
> And 4 vs 8 is a pretty sizeable difference.
True, but x86 was doing four instruction 20 years ago. As I mentioned the current state of the art (in a shipping product) is 9, and 9 is larger than 8. Importantly, this Skymont of leapfrogging decoders approach is scalable.
> whereas Apple can just decode each op independently & only decodes 8.
Apple isn't as free from serialisation as you suggest. Like X86, many instructions decode to multiple uops. According to research [1] instructions which decode to two uops are common and a few decode to as many as 12 uops.
It also does instruction fusion, two neighbouring instructions can sometimes decode into a single uop. This all means that there is plenty of serialisation within Apple's decoder. And branching also creates serialisation.
It's just not as simple as independently decoding eight instructions into eight uops every cycle. Simpler than what x86 implementations need to do, but not as brain-dead simple as you suggest.
Actually, Skymont's approach has an advantage over Apple here, because it only needs to serialise within each 3-wide decoder.
First, it's covers Intel Haswell, which is "not ryzen", it's not even AMD. Plus, Haswell is 12 years old at this point, how much relevance does it even have to modern Intel CPUs?
Second, the "instruction decoders" power zone was only 10%, not 20%. And still reported 3% on workloads that used very few instructions and always hit the uop cache. So really we are talking about 7% overhead for decoding instructions. They do speculate that other workloads use more power (they only tested two workloads), as the theoretical instruction throughput might be double. (which is where I suspect you got the 20% from), but they provide no evidence for that, and double the throughput doesn't mean double the power consumption. And double 7% + 3% base would be 17% at most.
Third. Intel doesn't publish any details about what this "instruction decoder" zone actually covers. It's almost certainly more just the "decoding x86" part. Given there are only four zones, I'm almost certain this zone covers the entire frontend, which includes branch prediction, instruction fetch, the stack engine. It might include register renaming too. Maybe instruction TLB lookups? I am reasonably sure it includes the (dynamic) power cost of accessing the L1i cache too.
So this 7% power usage is way more than just the decoding of decoding 86%. It's the entire frontend.
Finally. I haven't seen any power numbers for the front end of an equivalent ARM processor (like Apple's M1). For all we know, they are also using 7% of their power budget to fetch ARM instructions from the L1 cache, decode them, do branch prediction, do all the fancy front-end stuff. The 7% number isn't x86 overhead as many people imply, it's just the cost of running Haswell's frontend.
Without anything else to compare to, this 7% number is worthless. It's certainly an interesting paper, I don't have any major criticisms, but it simply cannot be used to support (or disprove) any arguments about the overhead of x86 decoding.
[1] https://www.usenix.org/system/files/conference/cooldc16/cool...
Unfortunately business management succumbs too easily to short term profit (b/c of a tunnel vision on shareholder return) and trendiness. There are people in fashion who say of business: geez you gotta stand for something for more than 10 minutes. Get some class!
Once instructions get past the instruction decoder, they have not been x86 since Pentium Pro on the server and Pentium II on the desktop. AMD has made great strides in optimising the instruction decoder performance to minimise the translation overhead on most frequently used instructions, and the pathological cases such as 15-byte-long instructions are no longer in active use anyway. There are legacy instructions that are still there, but I don't think they affect performance as they are mere dead silicon that is getting rationalised with X86S, which culls everything non-64-bit.
A more solid argument can be made that x86 is register-starved with great implications for performance, and that is true, especially for 32 bits. It is true to a certain extent with the 64-bit ISA (32 GPR's is still better than x86-64’s 16 GPR's), but various SIMD extensions have ameliorated the pain substantially. The remaining stuff, such as legacy CISC direct memory access instructions… compilers have not been emitting that stuff for over twenty years, and they just take up space in the dead silicon, lonely waiting, and yearning for a faithful moment of somebody finally giving them a tickle, which almost never comes, so the legacy instructions just cry and wail in defeaning silence.
An ISA was a decisive and critical factor from the performance perspective in the 1970s-80s, and, due to advances in the last few decades, including in-core instruction fusion, register renaming, coupled with enormously large register files, out-of-order, as well as speculative execution, etc., it is no longer clear-cut or a defining feature. We now live in the post-RISC era where old and new approaches have coalesced into hybrid designs.
Personally, I have never been a fan of the x86 ISA, although less from the technical perspective[0] and for a completely different reason – the Wintel duopoly had completely obliterated CPU alternatives, leading the CPU industry to stagnate, which has now changed and has given Intel a headache of epic proportions and haemorrhoids.
[0] The post AVX-2 code modern compilers generate is pretty neat and is reassonably nice to look at and work with. Certainly not before.
Could you clarify? What is "ISA", and why do you think it would have an impact on performance?
ashvardanian•9mo ago
BirAdam•9mo ago