Please define new. Also, I think AMD uses very similar cores in server and client. So, disabling AVX512 may be an Intel thing (my guess is that they can easily move threads between E & P cores).
It's pretty surprising that multiple CPU vendors have run into issues like this (some more than once, fucking Samsung), when it's pretty much the first thing that anyone on the toolchain side of thing asks when they hear about heterogenous cores on a CPU.
dgoldstein0•1h ago
dmitrygr•1h ago
anematode•1h ago
I can already immediately think of a use case for vunpackb in some of the stuff I'm working on, where we'd like to efficiently unpack weights from the high half of a vector.
Separately, adding all signed–unsigned variants of the VNNI dot product instructions is a welcome (albeit niche) change. There was an annoying divergence here between major ISAs: x86 added vpdpbusd which computed a dot product between u8 and i8, while ARM added vdotq, which computes a dot product either between u8 and u8 elements, or i8 and i8. So for broad compatibility, you generally had to restrict one of your inputs to [0,127]. This difference shows in the design of (for example) WASM relaxed SIMD, where the result of wasm.dot.i8x16.i7x16.add.signed is implementation-defined if you exceed the [0,127] range. ARM later added mixed-sign variants, and now x86 consummates it.
adrian_b•14m ago
Some of the latest generations of Intel server CPUs with P-cores already have the AMX matrix extension, which can be used to implement fast AI inference.
AMD has not implemented AMX yet, and probably they will not implement it, because this new "AI Compute Extension", which has been defined by Intel and AMD together, is an alternative to AMX.
Matrix extensions are more efficient for AI inference than vector extensions, because they reduce the ratio between memory accesses and computation operations.