> Modern CPUs measure their temperature and clock down if they get too hot, don't they?
Yes. It's rather complex now and it involves the motherboard vendor's firmware. When (not if) they get that wrong CPUs burn up. You're going to need some expertise to analyze this.
In general PC enthusiasts have always treat it these corporations bit like sports teams.
He never really got over the stuff with Linus and doubled down on stupid things. I think they both have a great place in the tech scene and LTT's videos of recent have been a lot better quality and researched then yesteryear.
This was Gordon's style, and Steve is continuing it. He has the courage to hit Bloomberg offices with a cameraman, so I don't think his words ring hollow.
We need that kind of in your face, no punches held back type of reporting when compared to "measured professionals".
That framing doesn't do him and the team justice. There is (or better, was) a 3.5h long story about NVIDIA GPUs finding their ways illegaly from the US to China, which got taken down by a malicious DMCA claim from Bloomberg. It is quite interesting to watch (Can be found archive.org).
GN is one of the last pro-consumer outlets, that keep on digging and shaking the tree big companys are sitting on.
He's essentially a low quality tabloid.
Not everywhere:
https://archive.org/details/the-nvidia-ai-gpu-black-market-i...
GN is unique in paying for silicon-level analysis of failures.
der8auer also contributes a lot to these stories.
I tend to wait for all 3 of their analyses, because each adds a different "hard-won" perspective.
I feel like if this was heat related, the overall CPU temperature should still somewhat slowly creep up, thereby giving everything enough time for thermal throttling. But their discoloration sure looks like a thermal issue, so I wonder why the safety features of the CPU didn't catch this...
(And,... 200A is the average when dissipating 200W. So how high are the switching currents? ;)
My best understanding of the avx-512 'power license' debacle on Intel CPUs was that the processor was actually watching the instruction stream and computing heuristics to lower core frequency before reaching avx512 or dense-avx2 instructions. I guessed they knew or worried that even a short large-vector stint would fry stuff...
Apparently voltage and thermal sensor have vastly improved and looking at the crazy swings on NVIDIA GPU's clocks seem to agree with this :-)
It doesn't strike me as odd that running an extremely power-heavy load for months continuously on such configurations eventually failed.
These big x86 CPUs in stock configuration can throttle down to speeds where they can function with entirely passive cooling, so even if the cooler was improperly mounted, they'd only throttle.
All that to say, if GMP is causing the CPU to fry itself, something went very wrong, and it is not user error or the room being too hot.
As in... what, AMD K6 / early Pentium 4 days was the last time I remember hearing about cpu cooler failing and frying a cpu?
I once worked on a piece of equipment that was running awful slow. The CPU was just not budging from its base clock of 700Mhz. As I was removing the stock Intel cooler, I noticed it wasn't seated fully. Once I removed it and looked I saw a perfectly clean CPU with no residue. I looked at the HSF, the original thermal paste was in pristine condition.
I remounted the HSF and it worked great. It ran 100% throttled for seven years before I touched it.
I've heard some really wild noises coming out of my zen4 machine when I've had all cores loaded up with what is best described as "choppy" workloads where we are repeatedly doing something like a parallel.foreach into a single threaded hot path of equal or less duration as fast as possible. I've never had the machine survive this kind of workload for more than 48 hours without some kind of BSOD. I've not actually killed a cpu yet though.
1. Evaluate population of candidates in parallel
2. Perform ranking, mutation, crossover, and objective selection in serial
3. Go to 1.
I can very accurately control the frequency of the audible PWM noise by adjusting the population size.
Everything is offset towards one side and the two CPU core clusters are way towards the edge, offset cooling makes sense regardless of usage.
Also, take a look at a delidded 9950; the two cpu chiplets are to one side, the i/o chiplet is in the middle, and the other side is a handful of passives. Offsetting the heatsink moves the center of the heatsink 7mm towards the chiplets (the socket is 40mm x 40mm), but there's still plenty of heatsink over the top of the i/o chiplet.
This article has some decent pictures of delidded processors https://www.tomshardware.com/pc-components/overclocking/deli...
TDP numbers are completely made up. They don’t correspond to watts of heat, or of anything at all! They’re just a marketing number. You can't use them to choose the right cooling system at all.
https://gamersnexus.net/guides/3525-amd-ryzen-tdp-explained-...
Couldn't this count as false/misleading advertizing though?
But yeah, TDP means nothing. If you stick plenty of cooling and run the right motherboard board revision your "TDP" can be whatever you want it to be until the thing melts.
Are you just describing product segmentation? ie. how the ryzen 5700x and 5800x are basically the same chip, down to the number of enabled cores, except for clocks and power limit ("TDP")?
I don't get it, are you referring to the phenomenon that different workloads have different power consumption (eg. a bunch of AVX512 floating point operations vs a bunch of NOPs), therefore TDP is totally made up? I agree that there's a lot of factors that impact power usage, and CPUs aren't like a space heater where if you let it run at full blast it'll always consume the TDP specified, but that doesn't mean TDP numbers are made up. They still vaguely approximate power usage under some synthetic test conditions, or at the very least is vaguely correlated to some limit of the CPU (eg. PPT limit on AMD platforms).
That said, it's definitely very frustrating as someone who does the occasional server build. Not only does TDP not reflect minimum or maximum power draw for a CPU package itself, but it's also completely divorced from power draw for the chipset(s), NICs, BMCs (ugh), etc, not to mention how the vendor BIOS/firmware throttles everything, and so TDP can be wildly different from power draw at the outlet. The past 5 years have kind of sucked for homelab builders. The Xeon E3 years were probably peak CPU and full-system power efficiency when accounting for long idle times. Can you get there with modern AMD and Intel chips? Maybe. Depends on who you ask and when. Even with identical CPUs, differences in motherboard vendor, BIOS settings, and even kernel can result in drastically different (as in 2-3x) reported idle power draw.
But they don’t use real temperatures from real systems. They just make up a different set of temperatures for each CPU that they sell, so that the TDP comes out to the number that they want. The formula doesn’t even mean anything, in real physical terms.
I agree that predicting power usage is far more difficult than it should be. The real power usage of the CPU is dependent on the temperature too, since the colder you can make the CPU the more power it will voluntarily use (it just raises the clock multiplier until it measures the temperature of the CPU rising without leveling off). And as you said there are a bunch of other factors as well.
For what, exactly? TDP stands for "thermal design power" - nothing in that means peak power or most power. It stopped being meaningful when CPUs learned to vary clock speeds and turbo boost - what is the thermal design target at that point, exactly? Sustained power virus load?
> The thermal solution bundled with the CPUs is not designed to handle the thermal output when all the cores are utilized 100%. For that kind of load, a different thermal solution is strongly recommended (paraphrased).
I never used the stock cooler bundled with the processor, but what kind of dark joke is this?
The Conroe Intel era was amazing for the time.
https://hwbusters.com/cpu/amd-ryzen-9-9950x-cpu-review-perfo...
The Asus Prime B650M motherboards they are using aren't exactly high end.
"According to new details from Tech Yes City, the problem stems from the amperage (current) supplied to the processor under AMD's PBO technology. Precision Boost Overdrive employs an algorithm that dynamically adjusts clock speeds for peak performance, based on factors like temperature, power, current, and workload. The issue is reportedly confined to ASRock's high-end and mid-range boards, as they were tuned far too aggressively for Ryzen 9000 CPUs."
https://www.tomshardware.com/pc-components/cpus/asrock-attri...
> What is GMP?
> The GNU Multiple Precision Arithmetic Library
> GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.
Many languages use it to implement long integers. Under the hood, they just call GMP.
IIUC the problem is related to the test suit, that is probably very handy if you ever want to fry an egg on top of your micro.
I just wanted to find out what GMP is.
Overall, there is a continued challenge with CPU temperatures that requires much tighter tolerances both in the thermal solution. The torque specs need to be followed and verified that they were met correctly in manufacturing.
The 9950X's TDP (Thermal Design Power) is 170 W, its default socket power is 200 W [2], and with PBO (Precision Boost Overdrive) enabled it's been reported to hit 235 W [3].
[1] https://www.overclockersclub.com/reviews/noctua_nh_u9s_cpu_c...
[2] https://hwbusters.com/cpu/amd-ryzen-9-9950x-cpu-review-perfo...
[3] https://www.tomshardware.com/pc-components/cpus/amd-ryzen-9-...
It sounds like the user likely did the opposite of the "offset seating" of the heatsink that Noctua recommended.
Take the AlphaServer DS25. It has wires going from the power supply harness to the motherboard that are thick enough to jump a car. The traces on the motherboard are so thick that pictures of the light reflecting off of them are nothing like a modern motherboard. The two CPUs take 64 watts each.
Now we have AMD CPUs that can take 170 watts? That's high, but if that's what the motherboards are supposed to be able to deliver, then the pins, socket and pads should have no problem with that.
Where's AMD's testing? Have they learned nothing watching Intel (almost literally) melt down?
A rule of thumb I use for cooling is, you can rarely have too much. You should over-engineer that aspect of your systems. That and the power supply.
I have a 7950x, with a water block capable of sinking up to 300W. Under heavy load, I hear the radiator fans spinning up, and I see the cpu temp hover around 90-93 C. That is ok, though cooler would be better. My next build (this one is 2 years old) will also use a water block, but with a higher flow rate, and a better radiator system. I like silent systems, though I don't like the magic smoke being released from components.
That's when I discovered actually ancient term "power virus". Anyway, after talking to different people I dismissed this weird behavior and moved on.
Reading this makes me worry I actually burned mobo in that testing.
craftkiller•7h ago
[0] https://upload.wikimedia.org/wikipedia/commons/2/2d/Socket_A...
raverbashing•6h ago
nsteel•5h ago