frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

[x86] AI Compute Extensions (ACE) Specification

https://x86ecosystem.org/resource/ai-compute-extensions-ace-specification/
19•matt_d•2h ago

Comments

dgoldstein0•1h ago
So how does this differ from available sse / avx instructions already in most x64 machines?
dmitrygr•1h ago
this also adds new registers to operate on (more state) - 1KB more state at least (512b x 16)
anematode•1h ago
One thing that stuck out to me is that deals with a lot more data formats, in particular, low-precision formats like FP4, FP6 and FP8. Manipulating those formats can take a lot of annoying effort; in general, x86 (until AVX-512, at least) has unconvincing support for so-called "lane-crossing" instructions that move data across 16-byte boundaries within a vector. So you can imagine unpacking, e.g., tightly packed 7-bit data to 8-bit data is a real slog.

I can already immediately think of a use case for vunpackb in some of the stuff I'm working on, where we'd like to efficiently unpack weights from the high half of a vector.

Separately, adding all signed–unsigned variants of the VNNI dot product instructions is a welcome (albeit niche) change. There was an annoying divergence here between major ISAs: x86 added vpdpbusd which computed a dot product between u8 and i8, while ARM added vdotq, which computes a dot product either between u8 and u8 elements, or i8 and i8. So for broad compatibility, you generally had to restrict one of your inputs to [0,127]. This difference shows in the design of (for example) WASM relaxed SIMD, where the result of wasm.dot.i8x16.i7x16.add.signed is implementation-defined if you exceed the [0,127] range. ARM later added mixed-sign variants, and now x86 consummates it.

adrian_b•14m ago
This is not a vector extension (like Intel AVX/AVX-512 or Arm SVE), but a matrix extension (like Intel AMX or Arm SME or the "tensor" operations of NVIDIA GPUs).

Some of the latest generations of Intel server CPUs with P-cores already have the AMX matrix extension, which can be used to implement fast AI inference.

AMD has not implemented AMX yet, and probably they will not implement it, because this new "AI Compute Extension", which has been defined by Intel and AMD together, is an alternative to AMX.

Matrix extensions are more efficient for AI inference than vector extensions, because they reduce the ratio between memory accesses and computation operations.

sorenjan•1h ago
AVX512 isn't available on most new CPUs, I'm guessing ACE will only be available on server CPUs for at least a couple of years at launch?
deadmutex•25m ago
> AVX512 isn't available on most new CPUs

Please define new. Also, I think AMD uses very similar cores in server and client. So, disabling AVX512 may be an Intel thing (my guess is that they can easily move threads between E & P cores).

murderfs•7m ago
They didn't disable it at first on their client CPUs, and it resulted in code randomly crashing depending on whether ifunc resolvers first ran on a big core or a little core.

It's pretty surprising that multiple CPU vendors have run into issues like this (some more than once, fucking Samsung), when it's pretty much the first thing that anyone on the toolchain side of thing asks when they hear about heterogenous cores on a CPU.

BobbyTables2•46m ago
Thank $ALL_DIETIES that the TCG wasn’t involved!

Midjourney Medical

https://www.midjourney.com/medical/blogpost
314•ricochet11•2h ago•259 comments

Lore – Open source version control system designed for scalability

https://lore.org/
1053•regnerba•14h ago•559 comments

Local Qwen isn't a worse Opus, it's a different tool

https://blog.alexellis.io/local-ai-is-not-opus/
19•alphabettsy•1h ago•1 comments

US holds off blacklisting DeepSeek, more than 100 firms deemed security risks

https://www.reuters.com/world/china/us-holds-off-blacklisting-chinas-deepseek-more-than-100-firms...
408•giuliomagnifico•1d ago•453 comments

Taxonomy of the Occlupanida (parasitoids on bread bag tags)

https://www.horg.com/horg/?page_id=921
103•beatthatflight•5h ago•18 comments

Storied Colors – a catalogue of named colors

https://storiedcolors.com/
121•susiecambria•6h ago•28 comments

Clojure Hosted on Go

https://github.com/glojurelang/glojure
65•dnlo•5h ago•9 comments

Show HN: Spin Lab

https://srijanshukla.com/artifacts/spin-lab/
21•srijanshukla18•1d ago•9 comments

[x86] AI Compute Extensions (ACE) Specification

https://x86ecosystem.org/resource/ai-compute-extensions-ace-specification/
19•matt_d•2h ago•9 comments

Loreline – Tools for writing interactive fiction

https://loreline.app/en/
103•smartmic•8h ago•12 comments

Show HN: We built an 8-bit CPU as 2nd year EE students

https://github.com/c0rRupT9/STEPLA-1
50•CorRupT9•2d ago•11 comments

Launch HN: Adam (YC W25) – Open-Source AI CAD

https://github.com/Adam-CAD/CADAM
171•zachdive•12h ago•84 comments

How we run Firecracker VMs inside EC2 and start browsers in less than 1s

https://browser-use.com/posts/firecracker-browser-infra
232•gregpr07•1d ago•154 comments

How Madrid built its metro cheaply (2024)

https://worksinprogress.co/issue/how-madrid-built-its-metro-cheaply/
69•trymas•8h ago•27 comments

RFC 10008: The new HTTP Query Method

https://www.rfc-editor.org/info/rfc10008/
340•schappim•17h ago•147 comments

Biological evolution and information acquisition

https://www.construction-physics.com/p/biological-evolution-and-information
24•chmaynard•6d ago•2 comments

Nim Conf 2026 (Online, Sat June 20)

https://conf.nim-lang.org/
3•pietroppeter•1h ago•1 comments

Show HN: An 8-bit live gamecast for baseball

https://ribbie.tv/watch
217•brownrout•12h ago•120 comments

Tesco moving 40k server workloads off VMware amid Broadcom's abusive conduct

https://arstechnica.com/information-technology/2026/06/tesco-moving-40000-server-workloads-off-vm...
229•Bender•7h ago•121 comments

Why thinking out loud with someone beats thinking alone

https://www.thesignalist.io/s/the-dialogue-dividend/
205•kodesko•15h ago•95 comments

Volkswagen started blocking GrapheneOS users

https://discuss.grapheneos.org/d/35949-volkswagen-app?page=3
544•microtonal•13h ago•351 comments

Show HN: Inkwash, a watercolor sketching app and explanation

https://johnowhitaker.github.io/inkwash/about
186•Yenrabbit•4d ago•21 comments

GLM-5.2 is the new leading open weights model on Artificial Analysis

https://artificialanalysis.ai/articles/glm-5-2-is-the-new-leading-open-weights-model-on-the-artif...
820•himata4113•19h ago•395 comments

U.S. science is in chaos

https://www.scientificamerican.com/article/americas-compact-between-science-and-politics-is-broken/
749•presspot•18h ago•911 comments

The Return of Rigorous Full-System Timing Simulation

https://www.sigarch.org/the-return-of-rigorous-full-system-timing-simulation/
39•matt_d•1d ago•0 comments

MicroUI – A tiny, portable, immediate-mode UI library written in ANSI C

https://github.com/rxi/microui
211•peter_d_sherman•16h ago•73 comments

Image Compression

https://www.makingsoftware.com/chapters/image-compression
182•vinhnx•4d ago•28 comments

Want your images back? That'll be $5

https://www.lutr.dev/want-your-images-back-sure-that-ll-be-5-dollars
620•lutr•15h ago•257 comments

The founder's playbook: Building an AI-native startup

https://claude.com/blog/the-founders-playbook
220•e2e4•21h ago•156 comments

Trellis AI (YC W24) hiring a product lead to build agents for healthcare access

https://www.ycombinator.com/companies/trellis-ai/jobs/Cg94htp-product-lead
1•macklinkachorn•11h ago