GMP damaging Zen 5 CPUs?

154•sequin•9h ago

Comments

craftkiller•7h ago

Looking at the AM5 pinout[0], it looks like those pins are VDDCR and VSS. There might be a little bit of PCIe sprinkled in towards the outer edges, but I'm not 100% on the orientation of this pinout vs the orientation of the CPU. I don't know anything about electricity so I've got nothing else to add.

[0] https://upload.wikimedia.org/wikipedia/commons/2/2d/Socket_A...

raverbashing•6h ago

This is a nice guess but the likelihood that actual silicon area is closely connected to the pins in that area is not so obvious

nsteel•5h ago

Isn't almost every other pin going to be power/ground on a high-power chip like this? On both the package and the die.

topspin•7h ago

As there is ongoing drama with Zen 5 and power issues, there are people with the instruments and the motivation to investigate this. You should consider contacting Gamers Nexus, and help them to get your test suite running. They can measure power draw and do a thermal analysis of this CPU, and they'd likely be eager to do it, given the possibility of making a bunch of dramatic YouTube content about design flaws in widely used hardware. That's pretty much their whole schtick in recent years.

> Modern CPUs measure their temperature and clock down if they get too hot, don't they?

Yes. It's rather complex now and it involves the motherboard vendor's firmware. When (not if) they get that wrong CPUs burn up. You're going to need some expertise to analyze this.

fxtentacle•7h ago

He's a bit sensationalist, yes, but I am thankful that he saved us from buying affected Intel CPUs.

spookie•7h ago

Not sure of sensationalist or just doing great reporting. I take him as one of the last good tech journalists on the platform.

hnuser123456•6h ago

GN wasn't the first to break the story the 13/14th gen was defective. The thousands and thousands of users experiencing the issues collectively noticed pretty quick. If anything, there was a period where he was saying "We've talked to Intel but we won't say anything yet until they do."

BoorishBears•6h ago

Dude has a cult of personality going and I've learned not to question it.

In general PC enthusiasts have always treat it these corporations bit like sports teams.

wiredpancake•2h ago

The only real problem with GN is Steve is a bit of an egotist when it comes to content creators who do less technical analysis, like LTT or Jayz.

He never really got over the stuff with Linus and doubled down on stupid things. I think they both have a great place in the tech scene and LTT's videos of recent have been a lot better quality and researched then yesteryear.

tpurves•5h ago

Yes. When he's right, he's right. However the main issue I have with GN is how Steve tends to go full Leeroy Jenkins pitchforks and torches for 9 out of every 5 actual scandals in the tech industry.

CaptainBanger•2h ago

I felt the same way, but over time I have come to respect those with the Crusader personality archetype, we need these people to do their thing and they need us to balance them out.

bayindirh•5h ago

He's a "student" and friend of late Gordon Mah Ung. He's carrying his torch forward.

This was Gordon's style, and Steve is continuing it. He has the courage to hit Bloomberg offices with a cameraman, so I don't think his words ring hollow.

We need that kind of in your face, no punches held back type of reporting when compared to "measured professionals".

mft_•3h ago

Absolutely - this is the sort of direct citizen journalism I expect (sort of hope?) we'll see more and of as traditional investigative journalism dies its slow death.

MrGilbert•6h ago

> [...] a bunch of dramatic YouTube content [...]

That framing doesn't do him and the team justice. There is (or better, was) a 3.5h long story about NVIDIA GPUs finding their ways illegaly from the US to China, which got taken down by a malicious DMCA claim from Bloomberg. It is quite interesting to watch (Can be found archive.org).

GN is one of the last pro-consumer outlets, that keep on digging and shaking the tree big companys are sitting on.

topspin•6h ago

For the record, I think GN is excellent and highly credible.

spoaceman7777•1h ago

He is one of the worst engagement-farming naive enthusiasts on that site. He frequently spreads factually incorrect technical information, and stirs drama intentionally.

He's essentially a low quality tabloid.

themafia•4h ago

> which got taken down

Not everywhere:

https://archive.org/details/the-nvidia-ai-gpu-black-market-i...

sitkack•1h ago

https://www.youtube.com/watch?v=ZyWelvEP_CQ

nerdsniper•6h ago

Wendell at Level1Techs often goes more in-depth on the software testing and datacenter use-case analysis through partnerships with friends who run lots of machines in datacenters.

GN is unique in paying for silicon-level analysis of failures.

der8auer also contributes a lot to these stories.

I tend to wait for all 3 of their analyses, because each adds a different "hard-won" perspective.

trebligdivad•4h ago

They don't say what temperature the CPU was reporting which seems like an odd omission. Whatever the specs of your cooler etc check the temperature it's actually running at. Go by what the CPU is saying! I've got the older 3950x, and the first one died after a few months (still in warranty) with a cooler in spec, but it would go into the 90s at full load just doing big builds. I replaced the heatsink with a basic watercooler when the replacement chip arrived and it's running at least 20c cooler at full load.

spoaceman7777•1h ago

All you really need to see is the picture of the CPU with thermal paste only on one half. Thermal throttling is tuned to work when there is 1. a sufficient heatsink (theirs was significantly below requirements) and 2. it is installed correctly so that its triggers for downclocking happen with the correct timing. This is just another instance of ridiculous PEBCAK error

tester756•7h ago

My Ryzen CPU recently died too! wtf

FuriouslyAdrift•6h ago

ASRock motherboard?

tester756•5h ago

Gigabyte

LASR•3h ago

Zen5?

fxtentacle•7h ago

"We suspect that GMP's extremely tight loops around MULX make the Zen 5 cores use much more power than specified, making cooling solutions inadequate."

I feel like if this was heat related, the overall CPU temperature should still somewhat slowly creep up, thereby giving everything enough time for thermal throttling. But their discoloration sure looks like a thermal issue, so I wonder why the safety features of the CPU didn't catch this...

jeffbee•7h ago

Are we talking "slowly" in a relative sense? A silicon die of this size has a thermal mass (guessing) around 10⁻³ J/K but a power dissipation rate over 200W, so it can rise from room temperature to junction temperature limits almost instantly.

topspin•7h ago

People without a background in electronics don't appreciate what modern CPUs and GPUs are doing: the amount of current flowing through these devices is just mind blowing. With adequate cooling, a Ryzen 9 9950X is handling somewhere in the neighborhood of 150-200 amps under high load.

nisegami•6h ago

I initially scoffed at the 150-200 amps. But I know core voltage is usually in the neighbourhood of 1V so to draw 200W, you really would have to basically be moving 200A of current. That's wild.

mlyle•6h ago

Yup. P=IV is really surprising when you get to high power parts at low core voltages. Needless to say, you need lots of transistors and phases on voltage conversion, and you need lots and lots of plane area.

(And,... 200A is the average when dissipating 200W. So how high are the switching currents? ;)

wtallis•6h ago

AMD's desktop CPUs are still running at a bit more than 1V; 1.3-1.4V is what you'll see at the high end of the clock speed range. But power draw can easily be in the 250–300W range if you turn on the "PBO" automatic overclocking mode, so 200A is not really the upper bound.

jeffbee•4h ago

What's really wild is with all the power scaling features the regulators have to step from zero to hundreds of amps in microseconds with very little overshoot. The power design for these modern systems is demanding.

touisteur•5h ago

I'm guessing the temperature could increase quite fast (milliseconds or less) in heavy duty areas, especially when going scalar-to-dense-vector operations.

My best understanding of the avx-512 'power license' debacle on Intel CPUs was that the processor was actually watching the instruction stream and computing heuristics to lower core frequency before reaching avx512 or dense-avx2 instructions. I guessed they knew or worried that even a short large-vector stint would fry stuff...

Apparently voltage and thermal sensor have vastly improved and looking at the crazy swings on NVIDIA GPU's clocks seem to agree with this :-)

BearOso•1h ago

They said it took months for each CPU to fail. Both systems used the same inadequate heatsink/fan. Then there's also the lower-end motherboards (they are not "top-quality", the brand means nothing) and the miniscule 450W power supply used in the initial configuration, which are confusingly paired with a 16-core CPU and 64/96GB of RAM.

It doesn't strike me as odd that running an extremely power-heavy load for months continuously on such configurations eventually failed.

tux3•7h ago

The room temperature or precise way the paste was applied should not matter. Modern CPUs have very advanced dynamic voltage and frequency scaling (DVFS), which accounts for several sensors, including temperature.

These big x86 CPUs in stock configuration can throttle down to speeds where they can function with entirely passive cooling, so even if the cooler was improperly mounted, they'd only throttle.

All that to say, if GMP is causing the CPU to fry itself, something went very wrong, and it is not user error or the room being too hot.

secabeen•6h ago

I would be interested to see if they had the same result with PTM7950 thermal material instead of paste. I've seen significantly better temps with these modern phase-change compounds, and they essentially eliminate application errors.

mk_stjames•6h ago

This was my first question as well- I thought it had been a long, long time since you could fry a CPU by taking away the heatsink.

As in... what, AMD K6 / early Pentium 4 days was the last time I remember hearing about cpu cooler failing and frying a cpu?

Twirrim•5h ago

It was some time around then. I remember AMD being late to it vs Intel.

dwood_dev•47m ago

Athlon era when AMD had no IHS but Intel had one. Intel also had thermal controls that AMD lacked.

I once worked on a piece of equipment that was running awful slow. The CPU was just not budging from its base clock of 700Mhz. As I was removing the stock Intel cooler, I noticed it wasn't seated fully. Once I removed it and looked I saw a perfectly clean CPU with no residue. I looked at the HSF, the original thermal paste was in pristine condition.

I remounted the HSF and it worked great. It ran 100% throttled for seven years before I touched it.

themafia•4h ago

If the throttling is not stable it could increase stress on the part by creating a bunch of transient but large thermal cycles through the chip. It would need to have some kind of exponential backoff on throttle so it doesn't immediately try to raise the frequencies when the temperature slightly dips.

bob1029•7h ago

Could be the power supply and load profile?

I've heard some really wild noises coming out of my zen4 machine when I've had all cores loaded up with what is best described as "choppy" workloads where we are repeatedly doing something like a parallel.foreach into a single threaded hot path of equal or less duration as fast as possible. I've never had the machine survive this kind of workload for more than 48 hours without some kind of BSOD. I've not actually killed a cpu yet though.

bee_rider•7h ago

Is that, like, an intentional stress-test for the hardware that you’ve come up with?

bob1029•5h ago

No. It is just how the algorithms play out:

1. Evaluate population of candidates in parallel

2. Perform ranking, mutation, crossover, and objective selection in serial

3. Go to 1.

I can very accurately control the frequency of the audible PWM noise by adjusting the population size.

tw04•7h ago

That looks like a combination of improperly mounting the heatsink and noctuna being wrong in their recommendation to offset it. I’d imagine for gaming cooling one side more makes sense but my completely uneducated guess is that GMP is working a different part of the CPU than gaming does.

jsheard•7h ago

This is what Zen5 looks like under the IHS: https://i.imgur.com/j85YUzX.jpeg

Everything is offset towards one side and the two CPU core clusters are way towards the edge, offset cooling makes sense regardless of usage.

toast0•7h ago

They had failures with standard mounting and offset mounting.

Also, take a look at a delidded 9950; the two cpu chiplets are to one side, the i/o chiplet is in the middle, and the other side is a handful of passives. Offsetting the heatsink moves the center of the heatsink 7mm towards the chiplets (the socket is 40mm x 40mm), but there's still plenty of heatsink over the top of the i/o chiplet.

This article has some decent pictures of delidded processors https://www.tomshardware.com/pc-components/overclocking/deli...

pharrington•6h ago

I'd assume both GMP and any CPU intensive game just prefer the performance cores.

jsheard•6h ago

AMDs desktop chips don't have distinct P and E cores, they're all P cores. AMD do have an E core design but it's currently only used in mobile and server parts.

pharrington•5h ago

Gotcha. Apparently Intel's marketing's gotten to me. I haven't really been keeping up with this stuff, so whenever I read about P & E cores in the past, I think I just assumed that was a thing both Intel & AMD were doing, without considering the source material too closely.

wtallis•3h ago

AMD has definitely been moving in that direction, and arguably doing a better job of it than Intel. But for now, AMD's desktop parts are still built with the same CPU core chiplets as their server parts, and none of the server parts are using heterogenous cores yet (from AMD or Intel). At some point AMD could theoretically build a desktop processor from one Zen chiplet and one Zen-c chiplet, but there hasn't been a good reason to do that yet.

nromiun•7h ago

How is that possible? Even if the chip did not get enough cooling it should have been just throttled heavily.

jsheard•6h ago

Modern silicon is so dense and heats up so fast that throttling is easier said than done. I think they have to model and predict the thermals ahead of time nowadays, because by the time they could react to a temp sensor alone, the chip might already be toast.

tliltocatl•6h ago

Maybe the throttling circuitry/firmware simply doesn't have enough time to react.

db48x•6h ago

> The so-called TDP of the Ryzen 9950X is 170W. The used heat sinks are specified to dissipate 165W, so that seems tight.

TDP numbers are completely made up. They don’t correspond to watts of heat, or of anything at all! They’re just a marketing number. You can't use them to choose the right cooling system at all.

https://gamersnexus.net/guides/3525-amd-ryzen-tdp-explained-...

einpoklum•6h ago

Wow, I can't believe how BS this TDP is! I feel like a total idiot! I've always assumed it's sorta-kinda a tight upper bound on power consumption, perhaps with some allowance for "imperfections" in the dissipation properties of the CPU, and that I shouldn't sweat the details.

Couldn't this count as false/misleading advertizing though?

vel0city•6h ago

Its pretty insane to see someone say something like: “TDP is about thermal watts, not electrical watts. These are not the same.” Watts are watts.

But yeah, TDP means nothing. If you stick plenty of cooling and run the right motherboard board revision your "TDP" can be whatever you want it to be until the thing melts.

o11c•38m ago

"TDP is about average watts, not peak watts" would be an honest way of saying it.

gruez•6h ago

It's thermal design power, ie. it's the power that it's designed for, not absolute max.

db48x•6h ago

No, they don’t design the chip with these numbers in mind. The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.

gruez•6h ago

>The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.

Are you just describing product segmentation? ie. how the ryzen 5700x and 5800x are basically the same chip, down to the number of enabled cores, except for clocks and power limit ("TDP")?

db48x•4h ago

Yep. The 5800X is a higher bin specifically because it can clock higher than the ones in the 5700X bin. That certainly makes them draw more power, so they give them a higher TDP number too. But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.

gruez•2h ago

>But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.

I don't get it, are you referring to the phenomenon that different workloads have different power consumption (eg. a bunch of AVX512 floating point operations vs a bunch of NOPs), therefore TDP is totally made up? I agree that there's a lot of factors that impact power usage, and CPUs aren't like a space heater where if you let it run at full blast it'll always consume the TDP specified, but that doesn't mean TDP numbers are made up. They still vaguely approximate power usage under some synthetic test conditions, or at the very least is vaguely correlated to some limit of the CPU (eg. PPT limit on AMD platforms).

lazide•1h ago

Apparently that’s not actually true?

gruez•54m ago

Which part?

wahern•5h ago

That seems a little too cynical. It matters how a customer might use a chip, such as the type of cooling that would be expected in a typical system using that model, and that's informed by the advertised specifications. Base clocks and the amount of SRAM also figure into TDP. No doubt there are completely arbitrary aspects to TDP driven purely by profit-focused market segmentation, but it's not just that.

That said, it's definitely very frustrating as someone who does the occasional server build. Not only does TDP not reflect minimum or maximum power draw for a CPU package itself, but it's also completely divorced from power draw for the chipset(s), NICs, BMCs (ugh), etc, not to mention how the vendor BIOS/firmware throttles everything, and so TDP can be wildly different from power draw at the outlet. The past 5 years have kind of sucked for homelab builders. The Xeon E3 years were probably peak CPU and full-system power efficiency when accounting for long idle times. Can you get there with modern AMD and Intel chips? Maybe. Depends on who you ask and when. Even with identical CPUs, differences in motherboard vendor, BIOS settings, and even kernel can result in drastically different (as in 2-3x) reported idle power draw.

db48x•4h ago

No, clock speed and cache have nothing to do with TDP. AMD uses a simple formula to calculate TDP. It is the temperature of the IHS minus the air temperature measured at the cpu cooler’s intake fan, divided by a conversion faction in °C/W.

But they don’t use real temperatures from real systems. They just make up a different set of temperatures for each CPU that they sell, so that the TDP comes out to the number that they want. The formula doesn’t even mean anything, in real physical terms.

I agree that predicting power usage is far more difficult than it should be. The real power usage of the CPU is dependent on the temperature too, since the colder you can make the CPU the more power it will voluntarily use (it just raises the clock multiplier until it measures the temperature of the CPU rising without leveling off). And as you said there are a bunch of other factors as well.

kllrnohj•6h ago

> Couldn't this count as false/misleading advertizing though?

For what, exactly? TDP stands for "thermal design power" - nothing in that means peak power or most power. It stopped being meaningful when CPUs learned to vary clock speeds and turbo boost - what is the thermal design target at that point, exactly? Sustained power virus load?

aidenn0•5h ago

I have a 65W TDP CPU, and the difference in power draw (measured at the outlet) from idle to full CPU load is over 100W; it seems to just raise the clock until it hist 95C, so if I limit the CPU fan's top speed, the power draw goes down.

db48x•4h ago

Yep. Modern CPUs continually adjust their clock multiplier based on what their temperature is doing, plus a few timers. If you have a better cooler then you’ll get more performance out of the same CPU, but at the cost of drawing more power and producing more heat.

bayindirh•5h ago

When I see the term TDP, I remember what I have read in the "Thermal Design Document" of Intel Core2Quad Q6600 and the family it belongs:

> The thermal solution bundled with the CPUs is not designed to handle the thermal output when all the cores are utilized 100%. For that kind of load, a different thermal solution is strongly recommended (paraphrased).

I never used the stock cooler bundled with the processor, but what kind of dark joke is this?

lofaszvanitt•2h ago

I always used the stock cooler, because it's quiet and nothing uses the cpu to its fullest :).

johncolanduoni•1h ago

Most states of “100% utilization” as you’d see in `top` are not 100% thermal output or even close. Cores waiting for memory accesses count as utilized in the former sense but will not produce as much heat as one that is actually using the ALU etc. That’s why special make-work like Prime95 is used for stress testing overclocking/thermals: it will saturate the cores with enough unblocked arithmetic work to generate more heat than having 1000 browser tabs open does.

cogman10•6m ago

Man that was a beast of a CPU back in the day.

The Conroe Intel era was amazing for the time.

mrb•2h ago

You are correct. In fact these guys measured a maximum socket power consumption of 240 watt using a 9950X at stock settings, running prime95. So far above the "170 watt" TDP:

https://hwbusters.com/cpu/amd-ryzen-9-9950x-cpu-review-perfo...

FuriouslyAdrift•6h ago

Most likely it's the motherboard. ASRock is getting nailed right now for unstable XMP and CPU voltages (it's recommended to undervolt a little just in case).

The Asus Prime B650M motherboards they are using aren't exactly high end.

kvemkon•6h ago

And the close-up photos of the socket with pins are missing.

J_Shelby_J•6h ago

My friend just had an ASRock board cook his AMD CPU. Apparently a very common problem.

wmf•5h ago

Yikes, this is the cheapest motherboard and failed Hardware Unboxed VRM tests. https://youtu.be/DTFUa60ozKY?t=744

caycep•4h ago

conversely the asrocks actually did pretty good in that test...

aidenn0•5h ago

Can you link to a reputable source for what settings I should use on my asrock motherboard? I'd like to avoid this.

FuriouslyAdrift•4h ago

No more than 1.2 volts on vsoc... but YMMV.

"According to new details from Tech Yes City, the problem stems from the amperage (current) supplied to the processor under AMD's PBO technology. Precision Boost Overdrive employs an algorithm that dynamically adjusts clock speeds for peak performance, based on factors like temperature, power, current, and workload. The issue is reportedly confined to ASRock's high-end and mid-range boards, as they were tuned far too aggressively for Ryzen 9000 CPUs."

https://www.tomshardware.com/pc-components/cpus/asrock-attri...

FuriouslyAdrift•4h ago

Also... reddit

https://old.reddit.com/r/buildapc/comments/1mvuzjw/alarming_...

on_the_train•6h ago

What is gmp?

kgwgk•6h ago

The domain has the answer: https://gmplib.org/

gus_massa•6h ago

From https://gmplib.org/#WHAT

> What is GMP?

> The GNU Multiple Precision Arithmetic Library

> GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.

Many languages use it to implement long integers. Under the hood, they just call GMP.

IIUC the problem is related to the test suit, that is probably very handy if you ever want to fry an egg on top of your micro.

beezle•6h ago

At first I thought it was Green Mountain Power ;)

protomikron•6h ago

Valid question i think in this context. I knew about GNU multiprecision library, but thought that couldnt be it, as it's "just" a highly optimized low level bit fiddling lib (at least thats my expectation without looking into the source), so it's strange why it could be damaging Hardware ...

wrs•6h ago

No actual die temperature measurements? That would seem a lot more relevant than the ambient temperature.

wtallis•6h ago

Die temperature readings aren't particularly helpful these days with desktop parts that will (depending on the power management settings) more or less keep increasing the clock speed until they reach ~90°C and just stay there. Upgrading from a bad/undersized heatsink can easily have only a tiny effect on temperature but have the effect of significantly increasing clock speed and power.

mqus•2h ago

Aren't they at least useful for ruling out any anomalies there? Like the die temp being 110°C constantly? Imho the die temperature is very important here, even if not interesting.

mastax•6h ago

Enthusiast-oriented motherboards often default enable Precision Boost Overdrive, causing higher power and temperature limits for longer periods. To run the CPU at “stock” you need to go in and disable that. Their default Load Line Calibration might be aggressive as well.

lloydatkinson•6h ago

One day I’ll understand why some websites refuse to have a way of navigating to the home page. I had to edit the URL in the address bar.

I just wanted to find out what GMP is.

mjh2539•4h ago

arbitrary-precision/bignum library

caycep•5h ago

I wonder if the risk is mitigated if you turn off PBO and turn on Eco Mode?

gpapilion•5h ago

Gradual damage is consistent with over heating. I've seen racks of servers do the same thing.

Overall, there is a continued challenge with CPU temperatures that requires much tighter tolerances both in the thermal solution. The torque specs need to be followed and verified that they were met correctly in manufacturing.

giantg2•3h ago

Not that it makes a huge difference since they are supposed to downclock when hot, but what was the actual cooler being used? It doesn't say in the article. My guess is that it's aircooled being only 165W max, but aircooled is not recommended for most newer high end CPUs.

T-A•2h ago

A quick search on the NH-U9S shows it's a compact cooler for small systems, rated for up to 140 W (see e.g. [1]).

The 9950X's TDP (Thermal Design Power) is 170 W, its default socket power is 200 W [2], and with PBO (Precision Boost Overdrive) enabled it's been reported to hit 235 W [3].

[1] https://www.overclockersclub.com/reviews/noctua_nh_u9s_cpu_c...

[2] https://hwbusters.com/cpu/amd-ryzen-9-9950x-cpu-review-perfo...

[3] https://www.tomshardware.com/pc-components/cpus/amd-ryzen-9-...

stouset•1h ago

That’s a good catch, but don’t modern CPUs thermally throttle, rather than risk damage? Not that you should rely on this with an underpowered cooling solution but I would expect worse performance, not a fried chip.

spoaceman7777•1h ago

Not really a lot it can do rapidly enough if there's only thermal paste on half the CPU.

It sounds like the user likely did the opposite of the "offset seating" of the heatsink that Noctua recommended.

BugsJustFindMe•58m ago

Noctua does not use TDP for their heatsinks and instead have CPU compatibility charts. They say it's fine, with "medium turbo/overclocking headroom". https://ncc.noctua.at/cpus/model/AMD-Ryzen-9-9950X-1831

chaoskitty•1h ago

This isn't good. Then again, the amount of power going in to these CPUs is way too high.

Take the AlphaServer DS25. It has wires going from the power supply harness to the motherboard that are thick enough to jump a car. The traces on the motherboard are so thick that pictures of the light reflecting off of them are nothing like a modern motherboard. The two CPUs take 64 watts each.

Now we have AMD CPUs that can take 170 watts? That's high, but if that's what the motherboards are supposed to be able to deliver, then the pins, socket and pads should have no problem with that.

Where's AMD's testing? Have they learned nothing watching Intel (almost literally) melt down?

hpcjoe•1h ago

I noticed the comments pointing out that TDP is a marketing number, and max power draw for this part can be higher. The cooling seems to have been inadequate.

A rule of thumb I use for cooling is, you can rarely have too much. You should over-engineer that aspect of your systems. That and the power supply.

I have a 7950x, with a water block capable of sinking up to 300W. Under heavy load, I hear the radiator fans spinning up, and I see the cpu temp hover around 90-93 C. That is ok, though cooler would be better. My next build (this one is 2 years old) will also use a water block, but with a higher flow rate, and a better radiator system. I like silent systems, though I don't like the magic smoke being released from components.

thway15269037•58m ago

I don't know about GMP, but I recently built a PC with 9950X3D. As part of initial testing, I ran Prime95 for 48 hours. Everything ran stable, but I noticed that part of the tests, I think it was FFT or something like that, caused incredibly sharp increase in temp. We are talking 60C average in the rest of the test vs immediate (less than a 5 seconds) 95+ degrees when that FFT thingie started. It was very weird.

That's when I discovered actually ancient term "power virus". Anyway, after talking to different people I dismissed this weird behavior and moved on.

Reading this makes me worry I actually burned mobo in that testing.

monster_truck•21m ago

All other potential causes aside, including the likely most-relevant of motherboard companies exceeding recommended defaults for power delivery: Running a cooling solution good for less than the TDP (which is NOT the max power, which tends to be about 30% higher than the TDP on these) is frankly extremely dumb. I've seen x950 processors of every generation pull at least double that on extreme workloads. I think it speaks to them being a bit clueless that they did not manually lower the thermal limits. You can cut the power and thermal limits by wild amounts and barely lose 15% multicore performance.

Yamanot.es: A music box of train station melodies from the JR Yamanote Line

Malicious versions of Nx and some supporting plugins were published

Altered states of consciousness induced by breathwork accompanied by music

Toyota is recycling old EV batteries to help power Mazda's production line

Google has eliminated 35% of managers overseeing small teams in past year

Unexpected productivity boost of Rust

Launch HN: Bitrig (YC S25) – Build Swift apps on your iPhone

About Containers and VMs

Show HN: Meetup.com and eventribe alternative to small groups

VIM Master

The Therac-25 Incident (2021)

Using information theory to solve Mastermind

Implementing Forth in Go and C

A failure of security systems at PayPal is causing concern for German banks

Object-oriented design patterns in C and kernel development

How to slow down a program and why it can be useful

Reverse-engineering the Globus INK, a Soviet spaceflight navigation computer (2023)

Efficient Array Programming

The National Design Studio is a scam

3D printing a building with 756 windows

'Rocks as big as cars' are flying down the Dolomites

Lago – Open-Source Usage Based Billing – Is Hiring in Sales, Eng, Ops (EU, US)

Bring Your Own Agent to Zed – Featuring Gemini CLI

Internet Access Providers Aren't Bound by DMCA Unmasking Subpoenas–In Re Cox

Monodraw

Typepad is shutting down

Areal, Are.na's new typeface

SDS: Simple Dynamic Strings library for C

What we find in the sewers

Show HN: Spart – A Rust library for fast spatial search with Python bindings

Yamanot.es: A music box of train station melodies from the JR Yamanote Line

Malicious versions of Nx and some supporting plugins were published

Altered states of consciousness induced by breathwork accompanied by music

Toyota is recycling old EV batteries to help power Mazda's production line

Google has eliminated 35% of managers overseeing small teams in past year

Unexpected productivity boost of Rust

Launch HN: Bitrig (YC S25) – Build Swift apps on your iPhone

About Containers and VMs

Show HN: Meetup.com and eventribe alternative to small groups

VIM Master

The Therac-25 Incident (2021)

Using information theory to solve Mastermind

Implementing Forth in Go and C

A failure of security systems at PayPal is causing concern for German banks

Object-oriented design patterns in C and kernel development

How to slow down a program and why it can be useful

Reverse-engineering the Globus INK, a Soviet spaceflight navigation computer (2023)

Efficient Array Programming

The National Design Studio is a scam

3D printing a building with 756 windows

'Rocks as big as cars' are flying down the Dolomites

Lago – Open-Source Usage Based Billing – Is Hiring in Sales, Eng, Ops (EU, US)

Bring Your Own Agent to Zed – Featuring Gemini CLI

Internet Access Providers Aren't Bound by DMCA Unmasking Subpoenas–In Re Cox

Monodraw

Typepad is shutting down

Areal, Are.na's new typeface

SDS: Simple Dynamic Strings library for C

What we find in the sewers

Show HN: Spart – A Rust library for fast spatial search with Python bindings

GMP damaging Zen 5 CPUs?

Comments