frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Show HN: Cant, rust nn lib for learning

https://github.com/TuckerBMorgan/can-t
1•TuckerBMorgan•2m ago•0 comments

Neovim plugin to prompt any model from Markdown files

https://github.com/robcmills/prompt.nvim
1•robcmills•3m ago•0 comments

Chemical Process Produces Critical Battery Metals with No Waste

https://spectrum.ieee.org/nmc-battery-aspiring-materials
1•stubish•8m ago•0 comments

Elon Musk opened a diner in Hollywood. What could go wrong?

https://www.theguardian.com/us-news/2025/jul/26/elon-musk-tesla-diner-hollywood
2•rob74•12m ago•0 comments

Doge is suggesting an AI tool that puts half of federal regs on a 'delete list'

https://www.engadget.com/big-tech/doge-is-reportedly-pushing-an-ai-tool-that-would-put-half-of-all-federal-regulations-on-a-delete-list-212053871.html
2•Incipient•14m ago•1 comments

Company developing Paducah laser uranium enrichment hits regulatory milestone

https://www.wkms.org/energy/2025-07-02/company-developing-paducah-laser-uranium-enrichment-facility-hits-key-regulatory-milestone
1•perihelions•15m ago•0 comments

Texas Is Getting Tough on Data Protection

https://www.adexchanger.com/data-privacy-roundup/texas-is-getting-tough-on-data-protection/
1•dotcoma•17m ago•0 comments

ChatGPT Gave Instructions for Murder, Self-Mutilation

https://www.theatlantic.com/technology/archive/2025/07/chatgpt-ai-self-mutilation-satanism/683649/
1•jrflowers•17m ago•0 comments

The future is not self-hosted, but self-sovereign

https://www.robertmao.com/blog/en/the-future-is-not-self-hosted-but-self-sovereign
1•robmao•18m ago•0 comments

Is Australia's bloated property market destroying the middle class?

https://www.theguardian.com/australia-news/2025/jul/13/great-job-good-education-no-home-is-australias-bloated-property-market-destroying-the-middle-class
3•PaulHoule•21m ago•0 comments

Show HN: I built a tool to fight YouTube clickbait with AI summaries

https://www.peekatube.com/en
1•project_stain•23m ago•0 comments

Show HN: Explore GitHub via What Stargazers Also Starred

https://github.com/fengkan/GitHub-Stargazer-Constellation
1•fengkan•29m ago•0 comments

Trump's AI Action Plan is a blueprint for dystopia

https://www.bloodinthemachine.com/p/trumps-ai-action-plan-is-a-blueprint
3•dotcoma•30m ago•0 comments

Are prompts the new unit of work?

https://www.archgw.com/blogs/are-prompts-the-new-unit-of-work
1•honorable_coder•34m ago•1 comments

How to expose Kubernetes OIDC JWKS endpoints

https://gawsoft.com/blog/kubernetes-oidc-expose-without-anonymous/
1•gawsoft•35m ago•1 comments

William Cowper's pet hares [1784]

https://cowperandnewtonmuseum.org.uk/the-history-of-my-three-hares/
2•quuxplusone•37m ago•0 comments

Post to HN

https://blog.cloudflare.com/zero-trust-warp-with-a-masque/
1•sawoo•50m ago•0 comments

$Lei – Aesthetic Computer

https://prompt.ac/$lei
1•justanothersys•1h ago•1 comments

Verify Identities During Self-Service Registration

https://fusionauth.io/blog/identity-verification-before-registration
1•mooreds•1h ago•0 comments

Fast and cheap bulk storage: using LVM to cache HDDs on SSDs

https://quantum5.ca/2025/05/11/fast-cheap-bulk-storage-using-lvm-to-cache-hdds-on-ssds/
11•todsacerdoti•1h ago•0 comments

Measuring Engineering

https://fffej.substack.com/p/measuring-engineering
1•mooreds•1h ago•0 comments

The Electron E1 Processor

https://www.efficient.computer/announcing-electron-e1-processor
3•bane•1h ago•1 comments

Smallest particulate matter sensor revolutionizes air quality measurement

https://www.bosch-sensortec.com/news/worlds-smallest-particulate-matter-sensor-bmv080.html
2•Liftyee•1h ago•0 comments

An Interview with Alex Ward

https://ciamweekly.substack.com/p/an-interview-with-alex-ward
1•mooreds•1h ago•0 comments

eSports for Engineers: course syllabus bridging gaming and STEM education [pdf]

https://github.com/sim-museum/esports-for-engineers/blob/master/files/syllabusFor_eSportsForEngineers.pdf
1•fifteenth•1h ago•0 comments

Voice AI for medical/premed students

https://www.codyliu.com/blog/rt-anki-voice-flashcards
1•codexliu•1h ago•0 comments

The Sputnik vs. Deep Seek Moment: The Answers

https://marginalrevolution.com/marginalrevolution/2025/07/the-sputnik-vs-deep-seek-moment-the-answers.html
1•ksec•1h ago•0 comments

Yixiang 16kWh Battery for $1,899? What in the world [video]

https://www.youtube.com/watch?v=7bShGUPU3TQ
3•xbmcuser•1h ago•1 comments

Chat Test Reporter – Google Chat Alerts for Test Runs in CI

https://chat-test-reporter.vercel.app/
1•jjuliobit•1h ago•1 comments

Show HN: Open-source "God mode killer" IGA in Keycloak

https://github.com/tide-foundation/keycloak-IGA
2•SaltNHash•1h ago•3 comments
Open in hackernews

Test Results for AMD Zen 5

https://www.agner.org/forum/viewtopic.php?t=287&start=10
212•matt_d•9h ago

Comments

eigenform•9h ago
This reminds me: has anyone ever figured out why Zen 3 was missing memory renaming, but it came back in Zen 4 and Zen 5?
Tuna-Fish•8h ago
AMD had two leapfrogging CPU design teams. Memory renaming was added by the team that did Zen2, presumably the Zen3 team couldn't import it in time for some reason.
alberth•9h ago
While an interesting read, the title is a bit misleading since I didn’t see any actual “test results” in the post.
ooopdddddd•9h ago
The detailed results are in the links at the bottom of the post.
Someone•8h ago
AMD’s documentation for the CPU may or may not state such things as “There are six integer ALUs, four address generation units, three branch units, four vector ALUs, and two vector read/write units”, but even if it does, Agnes Fog runs actual code to check that, and often discovers corner cases that the official documentation doesn’t mention.

So, he black box tests the CPU to try and discover its innards.

titanomachy•8h ago
> Agnes Fog

Agner

djoldman•8h ago
They are linked at the bottom of Mr. Fog's post. For example on page 142 of this:

https://www.agner.org/optimize/instruction_tables.pdf

ashvardanian•9h ago
> All vector units have full 512 bits capabilities except for memory writes. A 512-bit vector write instruction is executed as two 256-bit writes.

That sounds like a weird design choice. Curious if this will affect memcpy-heavy workloads.

Writes aside, Zen5 is taking much longer to roll out than I thought, and some of AMD's positioning is (almost expectedly) misleading, especially around AI.

AMD's website claims Zen5 is the "Leading CPU for AI" (<https://www.amd.com/en/products/processors/server/epyc/ai.ht...>), but I strongly doubt that. First, they compare Zen5 (9965), which is still largely unavailable, to Xeon2 (8280), a 2 generations older processor. Xeon4 is abundantly available and comes with AMX, an exclusive feature to Intel. I doubt AVX-512 support with a 512-bit physical path and even twice as many cores will be enough to compete with that (if we consider just the ALU throughput rather than the overall system & memory).

dragontamer•8h ago
Well, when you consider that AVX 512 instructions have 2 or 3 reads per 1 write, there's a degree of sense here.

Consider the standard matrix multiplication primitive the FMAC / multiply and accumulate: 3 reads and one write if I'm counting correctly .... (Output = A * B + C, three reads one output).

rpiguy•8h ago
It may be easier for the memory controller to schedule two narrower writes than waiting for one 512-bit block or perhaps they just didn't substantially update the memory controller and so it still has to operate as it did in Zen 4.
arrakark•7h ago
Cache-line bursts/beats tend to be standardized to 64B in lots of NoC architectures.
Dylan16807•6h ago
"Network on Chip" okay got it.
crest•5h ago
A 64B cache-line is the same size as an AVX-512 register.
ryao•6h ago
AMD CPUs tend to have more memory bandwidth than Intel CPUs and inference is CPU bound, so their claim seems accurate to me.

Whether the core does a 512-bit write in 1 cycle or 2 because it is two 256-bit writes is immaterial. Memory bandwidth is bottlenecked by 64GB/sec per CCX. You need to use cores from multiple CCXs to get full bandwidth.

That said, the EYPC 9175F has 614.4GB/sec memory bandwidth and should be able to use all of it. I have one, although the machine is not yet assembled (Supermicro took 7 weeks to send me a motherboard, which delayed assembly), so I have no confirmed that it can use all of it yet.

adgjlsfhk1•3h ago
you can use higher write bandwidth than the CCX bandwidth by having multiple writes that go to the same L2 address before going out to RAM
ryao•1h ago
> inference is CPU bound

This was a typo. It should have been “inference is memory bandwidth bound”.

vient•5h ago
AMX is indeed a very strong feature for AI. I've compared Ryzen 9950X with w7-2495X using single-thread inference of some fp32/bf16 neural networks, and while Zen 5 is clearly better than Zen 4, Xeon is still a lot faster even considering that its frequency is almost 1GHz less.

Now, if we say "Zen5 is the leading consumer CPU for AI" then no objections can be made, consumer Intel models do not even support AVX-512.

Also, note that for inference they compare with Xeon 8592+ which is the top Emerald Rapids model. Not sure if comparison with Granite Rapids would have been more appropriate but they surely dodged the AMX bullet by testing FP32 precision instead of BF16.

reitzensteinm•3h ago
This is a misreading of their website. On the left, they compare the EPYC 9965 (launched 10/10/24) with the Xeon Platinum 8280 (launched Q2 '19) and make a TCO argument for replacing outdated Intel servers with AMD.

On the right, they compare the EPYC 9965 (launched 10/10/24) with the Xeon Platinum 8592+ (launched Q4 23), a like for like comparison against Intel's competition at launch.

The argument is essentially in two pieces - "If you're upgrading, you should pick AMD. If you're not upgrading, you should be."

pbsd•8h ago
Vector ALU instruction latencies are understandably listed as 2 and higher, but this is not strictly the case. From AMD's Zen 5 optimization manual [1], we have

    The floating point schedulers have a slow region, in the oldest entries of a scheduler and only when the scheduler is full. If an operation is in the slow region and it is dependent on a 1-cycle latency operation, it will see a 1 cycle latency penalty.
    There is no penalty for operations in the slow region that depend on longer latency operations or loads.
    There is no penalty for any operations in the fast region.
    To write a latency test that does not see this penalty, the test needs to keep the FP schedulers from filling up.
    The latency test could interleave NOPs to prevent the scheduler from filling up.
Basically, short vector code sequences that don't fill up the scheduler will have better latency.

[1] https://www.amd.com/content/dam/amd/en/documents/processor-t...

vhcr•7h ago
https://web.archive.org/web/20250726202105/https://www.agner...
londons_explore•6h ago
> Integer vector instructions and floating point vector instructions now have the same latencies.

There is very little reason to use integers for anything anymore. Loop counter? Why not make it a double - you never know when you might need an extra 0.5 loops at the end!

bee_rider•6h ago
Finally we can implement BiCGStab intuitively!
Intralexical•6h ago
Integers aren't for performance. They're for precision (anything financial for example) and occasionally size.
crest•5h ago
At least historically integer operations also offered lower latency and higher throughput on CPUs. For decades integer addition and bitwise logical operations have been the canonical single-cycle instructions that any microarchitecture could perform at least once per cycle without visible latency while floating point operations and integer multiplication had multi-cycle latency if it was even fully pipelined.

Zen 5 breaks several performance "conventions" e.g. AMD went directly from one to three complex scalar integer units (multiplication, PDEP/PEXT, etc.).

Intel effectively has two vector pipelines and the shortest instruction latency is a single cycle while Zen 5 has four pipelines with a two cycle minimum latency. That's a *very* different optimisation target (aim for eight instead of two independent instructions in flight) for low level SIMD code going forward despite an identical instruction set.

sushevff•6h ago
Totally. Can’t wait to access the 18463.637th record in my database plus or minus a record or thousand.
vhcr•5h ago
Doubles can represent integers exactly up to 2^52
mark-r•5h ago
Actually because of the implied upper bit in the format, it can go to 2^53.
varispeed•6h ago
Is it better than M4?

If a laptop will need to be plugged in to deliver full performance, whilst blasting fans at full throttle, what is the point? (apart from server / workstation use, where you don't like MacOS or need different OS)

PixyMisa•6h ago
Price.
heraldgeezer•5h ago
Windows laptops?

Desktops for gaming? AMD makes the best gaming CPUs with the X3D series.

KetoManx64•5h ago
What about actually doing something useful to bring prosuctive?
bitmasher9•4h ago
If I’m being productive I’d rather have an AMD chip than M4 so I can run Linux comfortably.
adgjlsfhk1•3h ago
Zen5 is a beat for compilation workloads
crest•5h ago
Depends on your usecase. For a thin 14" laptop an M4 is probably the closer sweet spot, but for CPU heavy workloads Apple doesn't offer anything comparable to Threadripper or EPYC (lots of fast cores, enough memory and I/O bandwidth).
makeitdouble•2h ago
Nowadays laptops are majorly used as desktop hybrids.

Getting near desktop performance when plugged but portability and lower consumption when unplugged is a pretty good tradeoff.

kklisura•5h ago
Are there any good resources on how does one obtain all of this information?
rft•5h ago
The linked PDF in the post contains a section on how the values are measured and a link to the test suite. Search in [1] for "How the values were measured". For another project that measures the same/very similar values you can check out [2]. They have a paper about the tool they are using [3].

There is also AMD's "Software Optimization Guide" that might contain some background information. [4] has many direct attachments, AMD tends to break direct links. Intel should have similar docs, but I am currently more focused on AMD, so I only have those links at hand.

[1] https://www.agner.org/optimize/instruction_tables.pdf

[2] https://www.uops.info/background.html

[3] https://arxiv.org/abs/1911.03282

[4] https://bugzilla.kernel.org/show_bug.cgi?id=206537

matt_d•51m ago
See https://github.com/MattPD/cpplinks/blob/master/performance.t...
monster_truck•4h ago
This matches my experience with Zen in basically any generation. Once you've used all of the tricks and exhausted all of the memory and storage bandwidth, you'll still have compute left.

It's often faster to use one less core than you hit constraints at so that the processor can juggle them between cores to balance the thermal load as opposed to trying to keep it completely saturated.