frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Many Forms of Marcel Duchamp

https://www.newyorker.com/magazine/2026/05/04/marcel-duchamp-art-review-moma
1•petethomas•1m ago•0 comments

LLMs Don't Quite Beat Classical Hyperparameter Optimization Algorithms

https://github.com/ferreirafabio/autoresearch-automl
1•achierius•2m ago•1 comments

All the Sad Young Chinese Professionals

https://www.theatlantic.com/international/2026/04/china-loneliness-epidemic/686994/
2•petethomas•3m ago•0 comments

Thoughts on WebAssembly as a Stack Machine

https://eli.thegreenplace.net/2026/thoughts-on-webassembly-as-a-stack-machine/
2•mfrw•4m ago•0 comments

A Dab of DuckDB

https://peterdohertys.website/blog-posts/dab-of-duck.html
2•vismit2000•7m ago•0 comments

Poolr – shared photo albums for events, no app required for guests

https://www.getpoolr.com/
1•SupaMRVL•12m ago•0 comments

Maybe you should learn something

https://www.marginalia.nu/log/a_135_learn/
2•latexr•12m ago•0 comments

Our Wilderness Areas Are in Danger

https://apnews.com/article/mining-moratorium-trump-boundary-waters-permits-3d32cd0f591e0de0bfa3a8...
2•MrChoke•13m ago•0 comments

Stripe link-CLI: Let your agents spend on your behalf

https://github.com/stripe/link-cli
1•tjek•18m ago•0 comments

Your Biggest Vulnerability is your Shitty Compensation

https://green.spacedino.net/your-biggest-vulnerability-is-your-shitty-compensation/
2•jfil•18m ago•0 comments

H1-X Live Unveiling Event – Future of personal flight [video]

https://www.youtube.com/watch?v=sddOq3h9hNA
1•omer_k•22m ago•0 comments

Don't forget: The plural of anecdote is data

http://blog.danwin.com/don-t-forget-the-plural-of-anecdote-is-data/
1•aesthesia•25m ago•0 comments

Apple's lobbying effort saves it from new App Store rules

https://appleinsider.com/articles/26/04/27/apples-massive-lobbying-effort-saves-it-from-new-app-s...
3•latexr•26m ago•0 comments

Intel has best month ever, after years of losing to TSMC and Nvidia

https://www.cnbc.com/2026/04/30/intel-has-best-month-ever-after-years-of-losing-to-tsmc-and-nvidi...
3•elsewhen•31m ago•0 comments

NASA chief says he's in the camp of 'make Pluto a planet again'

https://www.space.com/astronomy/pluto/nasa-chief-jared-isaacman-says-hes-fighting-for-pluto-i-am-...
2•OutOfHere•40m ago•1 comments

Why Being Curious and Asking Questions Are Essential in Life [Book]

https://www.forbes.com/sites/marybethgasman/2026/04/30/why-being-curious-and-asking-questions-is-...
1•stmw•42m ago•0 comments

Passwordless Root Access in qubes

https://doc.qubes-os.org/en/latest/user/security-in-qubes/vm-sudo.html
2•negura•45m ago•0 comments

DeepSeek V4 Flash and V4 Pro in Microsoft Foundry

https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/introducing-deepseek-v4-flash-and-...
2•zhoutong•48m ago•1 comments

America's New Surveillance Dragnet

https://www.wsj.com/politics/policy/immigration-ice-arrests-surveillance-6f1cef64
11•julienchastang•1h ago•1 comments

Interactive physics moving block structures in Minecraft

https://github.com/ryanhcode/sable
3•LelouBil•1h ago•1 comments

Create Aeronautics

https://modrinth.com/mod/create-aeronautics
3•LelouBil•1h ago•0 comments

Find and fix any Windows error

https://errorcodereference.com/
1•megamike•1h ago•2 comments

Metro Hits 1.0.0 – Compile-Time Dependency Injection Framework for Kotlin

https://github.com/ZacSweers/metro/releases/tag/1.0.0
2•TheWiggles•1h ago•0 comments

Where the Money Is Coming From

https://www.warman.life/blog/2026-04-30-where-the-money-is-coming-from/
3•shaunistyping•1h ago•0 comments

OpenAI to use third-party cookies to advertise products

https://openai.com/policies/us-privacy-policy/
3•shpat•1h ago•0 comments

I made a weird AI kids app (Little Chicken)

https://apps.apple.com/us/app/kids-games-fun-little-chicken/id6759822036
1•mrWONDERFULguy•1h ago•0 comments

Cursor's 'Rogue' AI agent goes haywire, deletes company's database [video]

https://www.youtube.com/watch?v=XBVoLSXaAHA
2•mgh2•1h ago•2 comments

Ask HN: Any good ways to extend Codex sessions?

1•tabmate•1h ago•2 comments

AI Value Capture – The Shift to Model Labs

https://newsletter.semianalysis.com/p/ai-value-capture-the-shift-to-model
1•nsoonhui•1h ago•0 comments

Show HN: Vibe, a single-header C networking library for Linux

https://github.com/xtellect/vibe
3•enduku•1h ago•0 comments
Open in hackernews

Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs

https://arxiv.org/abs/2503.23817
230•cpldcpu•12mo ago

Comments

Bolwin•12mo ago
They're doing matrix operations in the Dram itself? That sounds insane and also fascinating
summarity•12mo ago
Getting LLM inference running on any thing is going to be the next “it runs Doom”
iszomer•12mo ago
I guess the more contextual nuance would be "..it runs Quake".
im3w1l•12mo ago
Well the goal here isn't to just run it. The goal is to run it at an attractive price/performance.
nkurz•12mo ago
Yup, and incredibly they are able to do this on standard RAM by "intentionally violating the timing parameters":

Processing-Using-DRAM (PUD) leverages the inherent analog operational characteristics of DRAM to enable highly parallel bit-serial computations directly within memory arrays. Prior research has demonstrated that commercial off-the-shelf DRAM can achieve PUD functionality without hardware modifications by intentionally violating the timing parameters.

These studies have established two fundamental PUD operations: RowCopy and majority-of-X (MAJX) (Fig. 1). The RowCopy operation facilitates data movement between different rows within a subarray by issuing a PRE command followed immediately by an ACT command before bitline precharging completes, enabling data transfer through the bitlines. This operation affects all cells along a row simultaneously, making it approximately 100 times faster than processor-mediated data movement. The MAJX operation performs a majority vote among X cells sharing the same bitline that are activated simultaneously, implemented in commercial DRAM by issuing ACT, PRE, and ACT commands in rapid succession without delays. This allows concurrent activation of 2∼32 rows. MAJX enables bit-serial computations that leverage the parallelism of subarrays with 65,536 columns, serving as the fundamental computational unit for PUD.

nayuki•12mo ago
This kind of low-level protocol manipulation of DRAM has some similarities to rowhammer attacks.
gwern•12mo ago
Can it be used to covertly run computations invisible to the OS or CPU?
nsteel•12mo ago
This research requires a custom memory controller that's doing "weird" things, the CPU isn't really getting involved here. It's very different compared to row hammer in my opinion. If you have a custom memory controller then I think all bets are off.
wtallis•12mo ago
Only to the same extent that any other co-processor add-in card can do stuff that's not observable by the CPU. Your CPU's RAM is managed by the CPU's memory controller hardware, and that memory controller does not give software the ability to issue individual DRAM commands like precharge. This research uses a memory controller implemented on a FPGA, talking to its own pool of RAM.
elcritch•12mo ago
I hope Micron or another commercial player builds a product on this!
tamlin•12mo ago
Samsung and SK-Hynix have had specs and papers for a few years already for HBM and GDDR. e.g.

* https://www.servethehome.com/sk-hynix-ai-memory-at-hot-chips... * https://www.servethehome.com/samsung-processing-in-memory-te...

Not sure anyone has started using it in production.

nsteel•12mo ago
And as mentioned in a comment elsewhere, LPDDR6-PIM is coming along too https://wccftech.com/samsung-collaborates-with-sk-hynix-in-p...

We'll see that before anything built around HBM or GDDR.

robwwilliams•12mo ago
This is just mind-bendingly weird and wonderfully creative. It can pay to work in the weeds! Bravo.
userbinator•12mo ago
This behaviour has been around since the earliest DRAMs with multiplexed row/column addresses. The Mostek MK4096 of 1973 could probably do this. Only took about half a century for someone to figure it out.
walterbell•12mo ago
> By intentionally issuing DRAM commands that violate manufacturer-specified timing parameters.. [gaining] massive parallelism up to 65,536 bitwise operations in parallel.

Take that, binary blobs for DRAM training!

willvarfar•12mo ago
Can we expect to see matrix multiplication and perhaps other ops move from classic CPUs out into the DRAM, perhaps with deliberate hardware support?

And does such a processing shift give advantage to Samsung etc? Where does this leave NVIDIA etc?

imtringued•12mo ago
Your questions are kind of amusing since Apple will use LPDDR6-PIM on the next generation of iPhones.

https://www.patentlyapple.com/2024/12/apple-plans-to-transit...

nsteel•12mo ago
I don't get it, what's the joke?
userbinator•12mo ago
Did anyone else notice the absolutely insane author lists of references 1 and 3?

I was expecting to find this 2016 article in there: https://news.ycombinator.com/item?id=12469270

This 2019 one does show up: https://news.ycombinator.com/item?id=22712811

Of course, this "out of spec" behaviour of DRAM, more specifically the ability to do copying, is also implicated in this infamous bug: https://news.ycombinator.com/item?id=5314959

It seems more than one person independently observed such a thing, and thought "this might be a useful behaviour".

s-macke•12mo ago
This seems to be a formatting error. For such a huge author list, you usually write only the first name and then "et al." for "others".
tomsmeding•12mo ago
The 'et al.' is used for in-article citations, if done in author-year format; references in the reference list are, to the extent that I've seen, always written out in full. I guess Google just wanted to make the life of any academic citing their work miserable. There are (unfortunately) conferences that have page limits that include the reference list; I wonder if an exception would be made here.
esafak•12mo ago
They want authors to think twice before citing someone. A curious incentive!
throwaway519•12mo ago
One day, I'm going to credit my entire department, deli and everyone in the park at 2pm as contributors too.
swimwiththebeat•12mo ago
So is this a new technique of doing computations within existing DRAM to overcome the memory wall issue of modern computing?
cpldcpu•12mo ago
Some more background information:

One of the original proposals for in-DRAM compute: https://users.ece.cmu.edu/~omutlu/pub/in-DRAM-bulk-AND-OR-ie...

First demonstration with off-the-shelf parts: https://parallel.princeton.edu/papers/micro19-gao.pdf

DRAM Bender, the tool they are using to implement this: https://github.com/CMU-SAFARI/DRAM-Bender

Memory-Centric Computing: Recent Advances in Processing-in-DRAMhttps://arxiv.org/abs/2412.19275

xhkkffbf•12mo ago
In-DRAM goes back a long time. There were plenty of papers in the 90s about various ideas for turning a bank of DRAM into a SIMD machine. They weren't as clever as some of these ideas or as well developed but these papers are just the latest versions of an old idea.
therealcamino•12mo ago
Do any of those techniques use unmodified DRAM or are you talking about processor-in-memory approaches?
dr_zoidberg•12mo ago
The abstract of OPs link mentions "Processing-Using-DRAM (PUD)" as exactly that, using off the shelf components. I do wonder how they achieve that, I guess fiddling with the controller in ways that are not standard but get the job (processing data in memory) done.

Edit: Oh and cpldcpu linked the ComputeDRAM paper that explains how to do it with off the shelf parts.

jiveturkey•12mo ago
That context is very helpful. But you don't need to poo-poo the ideas as "just another iteration". Everything we have today is built on top of decades of prior work. The paper itself mentions a lot of prior work.
morphle•12mo ago
A bit unscientific that they don't cite the original Intelligent RAM (IRAM) sources from 1997:

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=iram...

cpldcpu•12mo ago
I also strongly suspect that there are earlier sources.

However, IRAM looks like compute near memory where they will add an ALU to the memory chip. compute in memory is about using the memory array itself.

To be fair, CIM looked much less appealing before the advent of deep-learning with crazy vector lengths. So people rather tried to build something that allows more fine grained control of the operations.

morphle•12mo ago
>I also strongly suspect that there are earlier sources.

You are right, I remember 1972-ish papers where they did compute in memory. I just couldn't locate links to these papers in a few minutes.

xiphias2•12mo ago
This woule be a cool way to make a cheap inferencing device for the largest LLMs
protocolture•12mo ago
>General matrix-vector multiplication (GeMV)

Ok, so my math isnt great.

When I was studying Quaternions during my 3d math class (That I failed the first time, like I said, not a math guy) they briefly covered the history of matrix calculation in graphics development.

My understanding is that Quaternions became popular because they are almost as accurate as matrices but much less complex computationally.

Has anyone tried building an LLM using Quats instead of matrices?

Or are the optimisations with Quaternions more useful in realtime?

monocasa•12mo ago
My understanding was that the main benefit of quaternions in computer graphics was representing rotations in a way that doesn't result in gimble lock.

And beyond that, for those rotations, a quaternion doesn't scale nearly as well as you add dimensions. Complex numbers are a complex representation of two space, quaternions are a complex representation of three space, and to go to four space you have octonions, which have eight elements.

eru•12mo ago
Quaternions have four dimensions.
thomaskoopman•12mo ago
Yes, but quaternions of unit length are a representation of the rotation group in 3D space ( https://en.wikipedia.org/wiki/Representation_theory_of_SU(2)... ), which is how they are used for rotations.
suspended_state•12mo ago
The original question was: can quaternions be used in place of matrices to perform LLMs tasks, and the answer is: quaternions are 4 dimensions, with the implied meaning that matrices can cover different dimentionalities, which are needed for LLMs (and neural network in general).
eru•11mo ago
Yes, if you have essentially 4d objects and you disable 1 dimension by requiring unit length, you end up with something that is effectively 3d.

Of course, that the 3d thing you end up with represents rotations in 3d space is extremely neat; and not something all 3d things do.

formerly_proven•12mo ago
Axis-angle also doesn't have gimbal lock - the main advantage of quaternions is that actually performing rotations with them only involves addition and multiplication, no trigonometry at all. The same is true for using proper rotation matrices, but those use a lot more memory. Plus, you can actually lerp between quaternions (more generally - they compose). That doesn't work with matrices (I think)
monocasa•12mo ago
Axis angle has gimble lock when composed.
thomaskoopman•12mo ago
A matrix is a representation of a linear function (e.g. a function that plays nice with + and scalar multiplication). A specific subset can be used to describe rotations in 3D space. Quaternions can (arguably) do this better. But quaternions cannot be used to describe any linear function. So I do not think this makes sense for LLMs.
tzs•12mo ago
> But quaternions cannot be used to describe any linear function

Does this mean all functions that can be described by quaternions are non-linear, or does it mean that quaternions can describe some linear functions such as the ones associated with rotations in 3D space but there are linear function they cannot describe?

thomaskoopman•12mo ago
Quaternions (when viewed as vectors) are not linear functions, but the arguments to linear functions. You can add them: (a + bi + cj + dk) + (a' + b'i + c'j + d'k) = (a + a') + (b + b')i + (c + c')j + (d + d')k, and multiply them by a scalar: lambda * (a + bi + cj + dk)= (lambda * a) + (lambda * b)i + (lambda * c)j + (lambda * d)k. An example of a linear function on quaternions is the zero function. After all, zero(q + q') = 0 = 0 + 0 = zero(q) + zero(q'), and zero(lambda * q) = 0 = lambda * 0 = lambda * zero(q).

Matrices and quaternions take different approaches to describing rotations: a matrix sees a rotation as a linear function, and quaternions see rotations as a group (confusingly represented with matrices, this field is called representation theory if you want to know more).

So the answer to your question: there are linear functions that quaternions cannot describe. And quaternions can only describe a very specific class of linear functions (with some rather complicated maths behind them).

eru•12mo ago
Quaternions only have four fixed dimensions. For neural networks you need many, many more dimensions.
benob•12mo ago
I think you are mixing things. Quaternions are in the same category as complex numbers. They can be represented as matrices, and there are probably nice uses of matrices where the element is a quaternion (such as QDNNs) instead of a real number. My experience is that in massive architectures such as LLMs, simpler forms are more successful unless there is a true benefit to representing things with more elaborate types of scalars (such as in physics, or 3d graphics).
chasd00•12mo ago
In the hardware world are there risks of taking advantage of a bug knowing that the manufacturer may someday fix the bug? I know in the software world it's a bad idea to leverage a bug in a platform to enable a feature (or fix another bug). The bug you're counting on being present may get fixed 15 years in the future and then your system explodes and no one knows why.

edit: seems like there was a recent discussion about something similar... undefined behavior in some C function iirc

vlovich123•12mo ago
Undefined behavior in C/C++ has been discussed for a very very long time. I'd say the impact of it when combined with optimizing compilers first came to broader public awareness around the 2010ish time frame (maybe 2013?) which is now about 12+ years old.

As for this paper, it's not about relying on a bug but rather presenting what might be possible with DRAM in the hopes of standardizing capabilities.

alexpotato•12mo ago
This pops up in low latency HFT specifically with networking cards.

Certain network cards have either a bug or combination of features that work in an interesting way to the benefit of the trading firm.

These bugs (and features too) sometimes get removed in favor of either getting rid of the bug or those features are seen as not needed for the larger market etc. Therefore, firms will sometimes attempt to buy up all available supply of certain models.

nomel•12mo ago
This usually falls under "interoperability testing", but usually mitigated through your firmware rather than hardware. In the worst types of cases, you need to make sure your hardware works with some popular defunct vendor from 15 years ago since some big customers have used that hardware for 15 years, without issue, and will see your hardware as the problem when they plug it in.

For communication equipment, this is super important, with all sorts of "quirks" put in for vendors that didn't follow the spec. And, that includes keeping quirks in your firmware, so you don't break anyone else's. Imagine entire walls of legacy and long-gone and current competitor equipment, with robot arms to plug things in, and you have an idea of what some hardware validation labs look like.

Motherboard manufacturer firmware is also filled with quirks for specific CPUs, chipsets, etc.

lolc•11mo ago
Funny hack. Without having read the paper I'd assume the operations to be thermally unstable. So LLM inference results will vary based on environmental temperature :-)
nsteel•11mo ago
Yes, but only a little. Read the paper (or just search for "temperature") to see details.
lolc•11mo ago
Ok interesting, yes then I have to read it!