frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

I replaced the front page with AI slop and honestly it's an improvement

https://slop-news.pages.dev/slop-news
1•keepamovin•1m ago•0 comments

Economists vs. Technologists on AI

https://ideasindevelopment.substack.com/p/economists-vs-technologists-on-ai
1•econlmics•3m ago•0 comments

Life at the Edge

https://asadk.com/p/edge
1•tosh•9m ago•0 comments

RISC-V Vector Primer

https://github.com/simplex-micro/riscv-vector-primer/blob/main/index.md
2•oxxoxoxooo•12m ago•1 comments

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

2•InvoxoEU•13m ago•0 comments

A Tale of Two Standards, POSIX and Win32 (2005)

https://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
2•goranmoomin•16m ago•0 comments

Ask HN: Is the Downfall of SaaS Started?

3•throwaw12•18m ago•0 comments

Flirt: The Native Backend

https://blog.buenzli.dev/flirt-native-backend/
2•senekor•19m ago•0 comments

OpenAI's Latest Platform Targets Enterprise Customers

https://aibusiness.com/agentic-ai/openai-s-latest-platform-targets-enterprise-customers
1•myk-e•22m ago•0 comments

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
2•myk-e•24m ago•3 comments

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

https://www.ft.com/content/83488628-8dfd-4060-a7b0-71b1bb012785
1•1vuio0pswjnm7•25m ago•1 comments

Big Tech's AI Push Is Costing More Than the Moon Landing

https://www.wsj.com/tech/ai/ai-spending-tech-companies-compared-02b90046
3•1vuio0pswjnm7•27m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
2•1vuio0pswjnm7•29m ago•0 comments

Suno, AI Music, and the Bad Future [video]

https://www.youtube.com/watch?v=U8dcFhF0Dlk
1•askl•31m ago•2 comments

Ask HN: How are researchers using AlphaFold in 2026?

1•jocho12•34m ago•0 comments

Running the "Reflections on Trusting Trust" Compiler

https://spawn-queue.acm.org/doi/10.1145/3786614
1•devooops•39m ago•0 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
1•lembergs•40m ago•1 comments

Now send your marketing campaigns directly from ChatGPT

https://www.mail-o-mail.com/
1•avallark•44m ago•1 comments

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•56m ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
5•o8vm•57m ago•1 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•58m ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•1h ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•1h ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
2•helloplanets•1h ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•1h ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•1h ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•1h ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•1h ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
2•basilikum•1h ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•1h ago•1 comments
Open in hackernews

Test Results for AMD Zen 5

https://www.agner.org/forum/viewtopic.php?t=287&start=10
255•matt_d•6mo ago

Comments

eigenform•6mo ago
This reminds me: has anyone ever figured out why Zen 3 was missing memory renaming, but it came back in Zen 4 and Zen 5?
Tuna-Fish•6mo ago
AMD had two leapfrogging CPU design teams. Memory renaming was added by the team that did Zen2, presumably the Zen3 team couldn't import it in time for some reason.
JackYoustra•6mo ago
Any writeups on why they chose this system, whether its still used today, etc? I'm completely unfamiliar with this style of management.
throwaway81523•6mo ago
Dunno about writeups but I've worked in that system. Basically the product lifecycle is longer than one product generation. So you get to stay with it through the development, test/release, and maintenance phases, which are arranged to be 2 release cycles. It didn't seem paradoxical or anything. It just made sense.
Sesse__•6mo ago
It depends on having two CPU teams, though. There are not that many teams in the world that can design a high-performance microprocessor; I would assume that AMD has two and Apple has only one (which is why you got all these fillers with just larger and larger M1s in a trenchcoat, while the team was busy trying to make M3 happen).
alberth•6mo ago
While an interesting read, the title is a bit misleading since I didn’t see any actual “test results” in the post.
ooopdddddd•6mo ago
The detailed results are in the links at the bottom of the post.
Someone•6mo ago
AMD’s documentation for the CPU may or may not state such things as “There are six integer ALUs, four address generation units, three branch units, four vector ALUs, and two vector read/write units”, but even if it does, Agnes Fog runs actual code to check that, and often discovers corner cases that the official documentation doesn’t mention.

So, he black box tests the CPU to try and discover its innards.

titanomachy•6mo ago
> Agnes Fog

Agner

djoldman•6mo ago
They are linked at the bottom of Mr. Fog's post. For example on page 142 of this:

https://www.agner.org/optimize/instruction_tables.pdf

ashvardanian•6mo ago
> All vector units have full 512 bits capabilities except for memory writes. A 512-bit vector write instruction is executed as two 256-bit writes.

That sounds like a weird design choice. Curious if this will affect memcpy-heavy workloads.

Writes aside, Zen5 is taking much longer to roll out than I thought, and some of AMD's positioning is (almost expectedly) misleading, especially around AI.

AMD's website claims Zen5 is the "Leading CPU for AI" (<https://www.amd.com/en/products/processors/server/epyc/ai.ht...>), but I strongly doubt that. First, they compare Zen5 (9965), which is still largely unavailable, to Xeon2 (8280), a 2 generations older processor. Xeon4 is abundantly available and comes with AMX, an exclusive feature to Intel. I doubt AVX-512 support with a 512-bit physical path and even twice as many cores will be enough to compete with that (if we consider just the ALU throughput rather than the overall system & memory).

dragontamer•6mo ago
Well, when you consider that AVX 512 instructions have 2 or 3 reads per 1 write, there's a degree of sense here.

Consider the standard matrix multiplication primitive the FMAC / multiply and accumulate: 3 reads and one write if I'm counting correctly .... (Output = A * B + C, three reads one output).

rpiguy•6mo ago
It may be easier for the memory controller to schedule two narrower writes than waiting for one 512-bit block or perhaps they just didn't substantially update the memory controller and so it still has to operate as it did in Zen 4.
p_l•6mo ago
Zen 4 memory controllers operate preferably in multiplies of 512bits (single burst on 16n prefetch mode DDR5 channel, 4 channels on consumer Zen4 devices)
arrakark•6mo ago
Cache-line bursts/beats tend to be standardized to 64B in lots of NoC architectures.
Dylan16807•6mo ago
"Network on Chip" okay got it.
crest•6mo ago
A 64B cache-line is the same size as an AVX-512 register.
p_l•6mo ago
64 byte cache line size matches 64byte single burst transaction on DDR3-5, and ganged dual channel transaction on DDR2. Matching those together means you have a nice 1-to-1 relationship between filling a cache line and single fast memory transaction
ryao•6mo ago
AMD CPUs tend to have more memory bandwidth than Intel CPUs and inference is CPU bound, so their claim seems accurate to me.

Whether the core does a 512-bit write in 1 cycle or 2 because it is two 256-bit writes is immaterial. Memory bandwidth is bottlenecked by 64GB/sec per CCX. You need to use cores from multiple CCXs to get full bandwidth.

That said, the EYPC 9175F has 614.4GB/sec memory bandwidth and should be able to use all of it. I have one, although the machine is not yet assembled (Supermicro took 7 weeks to send me a motherboard, which delayed assembly), so I have no confirmed that it can use all of it yet.

adgjlsfhk1•6mo ago
you can use higher write bandwidth than the CCX bandwidth by having multiple writes that go to the same L2 address before going out to RAM
ryao•6mo ago
> inference is CPU bound

This was a typo. It should have been “inference is memory bandwidth bound”.

menaerus•6mo ago
Interesting design. 16 CCDs / 16 CCXs / 16 cores. 1 core per each CCD. 1 CCX per each CCD. With 512MB of L3 cache this CPU should be able to use ~all of its ~10 TB/s of L3 MBW out of the box.

How much is it going to cost you to build the box?

vient•6mo ago
AMX is indeed a very strong feature for AI. I've compared Ryzen 9950X with w7-2495X using single-thread inference of some fp32/bf16 neural networks, and while Zen 5 is clearly better than Zen 4, Xeon is still a lot faster even considering that its frequency is almost 1GHz less.

Now, if we say "Zen5 is the leading consumer CPU for AI" then no objections can be made, consumer Intel models do not even support AVX-512.

Also, note that for inference they compare with Xeon 8592+ which is the top Emerald Rapids model. Not sure if comparison with Granite Rapids would have been more appropriate but they surely dodged the AMX bullet by testing FP32 precision instead of BF16.

reitzensteinm•6mo ago
This is a misreading of their website. On the left, they compare the EPYC 9965 (launched 10/10/24) with the Xeon Platinum 8280 (launched Q2 '19) and make a TCO argument for replacing outdated Intel servers with AMD.

On the right, they compare the EPYC 9965 (launched 10/10/24) with the Xeon Platinum 8592+ (launched Q4 23), a like for like comparison against Intel's competition at launch.

The argument is essentially in two pieces - "If you're upgrading, you should pick AMD. If you're not upgrading, you should be."

ashvardanian•6mo ago
It’s true that they compare to different Intel CPUs in different parts of the webpage, and I don’t always understand the intentions behind those comparisons.

Still, if you decode the unreadable footnotes 2 & 3 in the bottom of the page - a few things stand out: avoiding AMX, using CPUs with different core-counts & costs, and even running on a different Linux kernel version, which may affect scheduling…

bcrl•6mo ago
It's probably a design choice that is driven by power consumption. 512 bit writes are probably used rarely enough that the performance benefits do not outweigh the additional power consumption that would be borne by all memory writes.
pbsd•6mo ago
Vector ALU instruction latencies are understandably listed as 2 and higher, but this is not strictly the case. From AMD's Zen 5 optimization manual [1], we have

    The floating point schedulers have a slow region, in the oldest entries of a scheduler and only when the scheduler is full. If an operation is in the slow region and it is dependent on a 1-cycle latency operation, it will see a 1 cycle latency penalty.
    There is no penalty for operations in the slow region that depend on longer latency operations or loads.
    There is no penalty for any operations in the fast region.
    To write a latency test that does not see this penalty, the test needs to keep the FP schedulers from filling up.
    The latency test could interleave NOPs to prevent the scheduler from filling up.
Basically, short vector code sequences that don't fill up the scheduler will have better latency.

[1] https://www.amd.com/content/dam/amd/en/documents/processor-t...

Dylan16807•6mo ago
So if you fill up the scheduler with a long line of dependent instructions, you experience a significant slowdown? I wonder why they decided to make it do that instead of limiting size/fill by a bit. What all the tradeoffs were.
vhcr•6mo ago
https://web.archive.org/web/20250726202105/https://www.agner...
londons_explore•6mo ago
> Integer vector instructions and floating point vector instructions now have the same latencies.

There is very little reason to use integers for anything anymore. Loop counter? Why not make it a double - you never know when you might need an extra 0.5 loops at the end!

bee_rider•6mo ago
Finally we can implement BiCGStab intuitively!
Intralexical•6mo ago
Integers aren't for performance. They're for precision (anything financial for example) and occasionally size.
crest•6mo ago
At least historically integer operations also offered lower latency and higher throughput on CPUs. For decades integer addition and bitwise logical operations have been the canonical single-cycle instructions that any microarchitecture could perform at least once per cycle without visible latency while floating point operations and integer multiplication had multi-cycle latency if it was even fully pipelined.

Zen 5 breaks several performance "conventions" e.g. AMD went directly from one to three complex scalar integer units (multiplication, PDEP/PEXT, etc.).

Intel effectively has two vector pipelines and the shortest instruction latency is a single cycle while Zen 5 has four pipelines with a two cycle minimum latency. That's a *very* different optimisation target (aim for eight instead of two independent instructions in flight) for low level SIMD code going forward despite an identical instruction set.

sushevff•6mo ago
Totally. Can’t wait to access the 18463.637th record in my database plus or minus a record or thousand.
vhcr•6mo ago
Doubles can represent integers exactly up to 2^52
mark-r•6mo ago
Actually because of the implied upper bit in the format, it can go to 2^53.
varispeed•6mo ago
Is it better than M4?

If a laptop will need to be plugged in to deliver full performance, whilst blasting fans at full throttle, what is the point? (apart from server / workstation use, where you don't like MacOS or need different OS)

PixyMisa•6mo ago
Price.
heraldgeezer•6mo ago
Windows laptops?

Desktops for gaming? AMD makes the best gaming CPUs with the X3D series.

KetoManx64•6mo ago
What about actually doing something useful to bring prosuctive?
bitmasher9•6mo ago
If I’m being productive I’d rather have an AMD chip than M4 so I can run Linux comfortably.
adgjlsfhk1•6mo ago
Zen5 is a beat for compilation workloads
heraldgeezer•6mo ago
AMD wins over Intel here too.

Most of the workforce use Windows.

You can also use Linux if you want on Intel&AMD.

M CPUs are great but constrained by Apple.

crest•6mo ago
Depends on your usecase. For a thin 14" laptop an M4 is probably the closer sweet spot, but for CPU heavy workloads Apple doesn't offer anything comparable to Threadripper or EPYC (lots of fast cores, enough memory and I/O bandwidth).
menaerus•6mo ago
Actually Apple M design can hit ~100GB/s of MBW with a single core. Something that many other (or basically none?) CPUs of the same range couldn't.
mmis1000•6mo ago
Maybe wait for the next release of amd mobile cpu? I heard that they throw 384 bit bus on i-gpu. While the main purpose is for faster vram access. It surely will also benefit memory bound cpu tasks.
menaerus•6mo ago
Server CPUs are hitting those numbers already for many years, including the AMD. The thing here is that Apple optimized their core for a different workload than the rest. I don't think there's a secret sauce AMD isn't aware of given their other line of CPUs - they know how to achieve it.

In multi-threaded scenarios, for example, M chips are not better at all and AFAICR are worse than the Threadripper. So, a different trade-off really

makeitdouble•6mo ago
Nowadays laptops are majorly used as desktop hybrids.

Getting near desktop performance when plugged but portability and lower consumption when unplugged is a pretty good tradeoff.

hulitu•6mo ago
> Nowadays laptops are majorly used as desktop hybrids.

And they suck big time. And, to add insult to injury, there are also desktops which use laptop CPUs, with the same (lack of) performance.

ksec•6mo ago
>And they suck big time.

What sort of work load that sucks big time? Assuming the work load is even laptop focused in the first place.

varispeed•6mo ago
Opening Word, more than a few tabs in the browser, that kind of heavy load.
varispeed•6mo ago
Friend of mine has laptop with Intel Ultra 9 185h. It is always plugged because when you don't plugin in, it is crawling (like even struggles to open Word). Fans are always spinning and it is loud.

For doing any kind of work that requires focus it is an absolute nightmare.

But she need a laptop to occasionally take it to Uni.

makeitdouble•6mo ago
> Intel Ultra 9 185h

The CPU in itself should be pretty good by modern standards: https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+Ultra+9+...

> (like even struggles to open Word).

Her issue is not the form factor. Is it the RAM ? did she activate all the marketing apps ? Is a bitcoin farmer running in the background ? I don't know, but it's worth looking into it.

For comparison, I have at hand a Surface Pro 8 that should be 3x slower than hers on sheer CPU benchmarks, and I can throw any run of mill task at it (do the taxes with 3~4 word documents, Excel, dozens of tabs in firefox and a call session in the background) and it's fine. It will burn trough battery life within an two or three hours under that load, and yes the fans will be running, but I have no issue of having it crawl when unplugged.

chemmail•6mo ago
Could be the low power ecores being in use somehow? Meteor Lake has 3 types of core with 2 lpE cores in the SOC to try to turn off the Main P and E core tile. Lunar Lake removed the lpE cores and it does feel faster when surfing pages like reddit than my 12th gen and 5000hz laptops. I also tried the Ryzen AI and it is pretty close but 20% less battery life. They get a pretty crazy 15hr battery life now.
sliken•6mo ago
Sounds like my macbook 16" with the Intel i9. Just about anything, full screen video call, backups, patching, etc it sounds like a hair dryer. I'm not surprised there's various docks, stands, etc that include supplemental cooling.

I'm jealous of the m series macbooks, fast, quiet, and cool on wall or battery.

kklisura•6mo ago
Are there any good resources on how does one obtain all of this information?
rft•6mo ago
The linked PDF in the post contains a section on how the values are measured and a link to the test suite. Search in [1] for "How the values were measured". For another project that measures the same/very similar values you can check out [2]. They have a paper about the tool they are using [3].

There is also AMD's "Software Optimization Guide" that might contain some background information. [4] has many direct attachments, AMD tends to break direct links. Intel should have similar docs, but I am currently more focused on AMD, so I only have those links at hand.

[1] https://www.agner.org/optimize/instruction_tables.pdf

[2] https://www.uops.info/background.html

[3] https://arxiv.org/abs/1911.03282

[4] https://bugzilla.kernel.org/show_bug.cgi?id=206537

matt_d•6mo ago
See https://github.com/MattPD/cpplinks/blob/master/performance.t...
monster_truck•6mo ago
This matches my experience with Zen in basically any generation. Once you've used all of the tricks and exhausted all of the memory and storage bandwidth, you'll still have compute left.

It's often faster to use one less core than you hit constraints at so that the processor can juggle them between cores to balance the thermal load as opposed to trying to keep it completely saturated.

tw1984•6mo ago
this is very interesting. any chance you have more concrete stats or results?

thanks

Sesse__•6mo ago
I had real code that ran with IPC > 6 on Zen 3; I think that's the first time I've seen a modern CPU _really_ be ALU-bound. :-) But it was very unusual, and when I vectorized it, it ran completely different.
menaerus•6mo ago
Zen3 decode is 4-wide + 8 uOp cache, and dispatch backend is 6 uOps wide. Theoretically, it shouldn't be possible to have IPC larger than 6.
Sesse__•6mo ago
I agree, the 6.02 or whatever I got was probably a perf monitoring artifact.
menaerus•6mo ago
It's interesting nonetheless. I wouldn't expect measuring such an IPC in the wild without having to craft the code in such an artificial way so that it hits that bound.
Sesse__•6mo ago
Me neither, especially since it was rather branchy (though almost all of the branches were, obviously, easily predictable). It was dominated by simple AND/OR/TEST, though, which I guess can go into a bazillion ports.

Perhaps instruction fusion somehow played into it?

menaerus•6mo ago
Probably everything that there is including the instruction fusion, no decoding aka uop cache, ideal port dispatching, no data dependency etc. If it was a loop then perhaps even LSD.
Sesse__•6mo ago
It was a 20-way nested loop (!), but it probably spent all (>99%) of its time in a few of the depths. Pretty sure all of the actually executed code would fit into the LSD.

Then I moved stuff into huge precalced arrays instead, and it became intensely memory bound. :-)

menaerus•6mo ago
Yeah, it might be the LSD then, basically no frontend involved after the first loop iteration, and then no bottleneck in the backend as well.

So, what did you end up having in the code? Ugly and fast or nice and slow? :)

Sesse__•6mo ago
It's essentially research code, so it's getting uglier and uglier and faster and faster :-) It has stuff like “if I remove this assert(), then Clang does something stupid and 30% of CPU time is spent stalling on this single instruction, so meh, leave it in”. It's not going to be maintained once it's done its computation job. (https://oeis.org/draft/A286874 if you're curious.)
menaerus•6mo ago
> if I remove this assert(), then Clang does something stupid and 30% of CPU time is spent stalling on this single instruction, so meh, leave it in

Classic compiler games and similar happened to me just recently when I wrote a micro-optimized SIMD code for some monotonically increasing integer sequence utility that achieved like 80% of the theoretical IPC (for skylake-x) in ubenchmarks, however, once I moved the code from ubenchmark to the production code what I saw was surprising (or not really) - compiler merged my carefully optimized SIMD code with the surrounding code and largely nullified the optimizations I've done.

Sesse__•6mo ago
Haha, yes, autovectorization is so much in the way sometimes. I have a bunch of hard-coded AVX2/AVX512 intrinsics lying around since the compiler can do it fine on Compiler Explorer but not in context. Still, having a stall on a single 512-bit add like that suggests something very odd in the µarch. Perhaps something like “we're all out of physical registers and we're going into some kind of panic mode” that is avoided by inserting the assert() branches and slowing things down. No idea, I'm not a Zen microarchitecture expert.

Edit: I ran the code on an Intel CPU (Kaby Lake, on my laptop) and there's no slowdown when removing the assert(). So it really seems to be something Zen-specific and weird.

menaerus•6mo ago
I started to appreciate that compilers can do only as much and from my experience auto-vectorization doesn't really shine that much, it leaves a lot of performance on the table, and then it also messes up with the hand optimized code.

> So it really seems to be something Zen-specific and weird.

Number and/or type of ports. Perhaps even the code generation is different so it could be the compiler backend differences too for different uarchs

qwertox•6mo ago
At the bottom of the post is a link to a PDF of "The microarchitecture of Intel, AMD, and VIA CPUs - An optimization guide for assembly programmers and compiler makers" [0]

You might want to download it and just take a look at it so you know that this content exists.

[0] https://www.agner.org/optimize/microarchitecture.pdf

Sesse__•6mo ago
Great, now we just need the uops.info team to do the same. :-)
ksec•6mo ago
Given how Apple's M4 Core can access all of the L2 Cache ( it is shared ) and has a SLC ( System Level Cache ) one could argue it is better to compare it to AMD X3D variant on Cache size. However on Geekbench 6 it is still off by 30-40% per clock. Even if we consider zero performance improvement from M5, it would be a large jump for Zen 6 to catch up.

And that is also the case with Qualcomm's Oryon and ARM's own Cortex X93x series.

Still really looking forward to Zen 6 on server though. I cant wait to see 256 Zen 6c Core.

alberth•6mo ago
Isn’t Zen fab’ed on nodes sizes Apple used 2-3 years ago (since Apple pays for exclusive rights to TSMC for latest & greatest node sizes).
ksec•6mo ago
Yes. N4 or 5nm Class compared to Apple's N3E or 2nd Gen 3nm. But the gap in IPC remains the same regardless of node. AMD could scale higher or has lower energy usage, it still wouldn't change the performance.

Not only is the Zen 5 slower, it also uses more energy to achieve the its results. Thinking about that the gap is staggering.

kvemkon•6mo ago
> AMD chips don't have an equivalent to Intel PT. We'd love to add support as soon as they make one. (2022) [1]

> since 2013, Intel offers a feature called "intel processor tracing [2]

> [not answered]

> When will AMD cpus introduce Intel-PT tech or the Intel branch trace store feature? (2024) [3]

> [not answered]

Is Intel-PT over-engineered and not really needed in practice?

[1] https://github.com/janestreet/magic-trace/wiki/How-could-mag...

[2] https://community.amd.com/t5/pc-processors/amd-ipt-intelpt-i...

[3] https://community.amd.com/t5/pc-processors/will-amd-cpus-hav...

Sesse__•6mo ago
I've used Intel PT several times; it's completely unbeatable for some things.

In general, Intel is _way_ ahead of AMD in the performance monitoring game. For instance, IBS is a really poor replacement for PEBS (it still hits the wrong instructions, it just re-weights them and this rarely goes well), which makes profiling anything branchy or memory-bound really hard. This is the only real reason why I prefer to buy Intel CPUs still myself (although I understand this is a niche use case!).