frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Brute Force Colors (2022)

https://arnaud-carre.github.io/2022-12-30-amiga-ham/
1•erickhill•40s ago•0 comments

Google Translate apparently vulnerable to prompt injection

https://www.lesswrong.com/posts/tAh2keDNEEHMXvLvz/prompt-injection-in-google-translate-reveals-ba...
1•julkali•49s ago•0 comments

(Bsky thread) "This turns the maintainer into an unwitting vibe coder"

https://bsky.app/profile/fullmoon.id/post/3meadfaulhk2s
1•todsacerdoti•1m ago•0 comments

Software development is undergoing a Renaissance in front of our eyes

https://twitter.com/gdb/status/2019566641491963946
1•tosh•2m ago•0 comments

Can you beat ensloppification? I made a quiz for Wikipedia's Signs of AI Writing

https://tryward.app/aiquiz
1•bennydog224•3m ago•1 comments

Spec-Driven Design with Kiro: Lessons from Seddle

https://medium.com/@dustin_44710/spec-driven-design-with-kiro-lessons-from-seddle-9320ef18a61f
1•nslog•3m ago•0 comments

Agents need good developer experience too

https://modal.com/blog/agents-devex
1•birdculture•4m ago•0 comments

The Dark Factory

https://twitter.com/i/status/2020161285376082326
1•Ozzie_osman•4m ago•0 comments

Free data transfer out to internet when moving out of AWS (2024)

https://aws.amazon.com/blogs/aws/free-data-transfer-out-to-internet-when-moving-out-of-aws/
1•tosh•5m ago•0 comments

Interop 2025: A Year of Convergence

https://webkit.org/blog/17808/interop-2025-review/
1•alwillis•7m ago•0 comments

Prejudice Against Leprosy

https://text.npr.org/g-s1-108321
1•hi41•7m ago•0 comments

Slint: Cross Platform UI Library

https://slint.dev/
1•Palmik•11m ago•0 comments

AI and Education: Generative AI and the Future of Critical Thinking

https://www.youtube.com/watch?v=k7PvscqGD24
1•nyc111•12m ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•12m ago•0 comments

Moltbook isn't real but it can still hurt you

https://12gramsofcarbon.com/p/tech-things-moltbook-isnt-real-but
1•theahura•16m ago•0 comments

Take Back the Em Dash–and Your Voice

https://spin.atomicobject.com/take-back-em-dash/
1•ingve•17m ago•0 comments

Show HN: 289x speedup over MLP using Spectral Graphs

https://zenodo.org/login/?next=%2Fme%2Fuploads%3Fq%3D%26f%3Dshared_with_me%25253Afalse%26l%3Dlist...
1•andrespi•18m ago•0 comments

Teaching Mathematics

https://www.karlin.mff.cuni.cz/~spurny/doc/articles/arnold.htm
2•samuel246•20m ago•0 comments

3D Printed Microfluidic Multiplexing [video]

https://www.youtube.com/watch?v=VZ2ZcOzLnGg
2•downboots•20m ago•0 comments

Abstractions Are in the Eye of the Beholder

https://software.rajivprab.com/2019/08/29/abstractions-are-in-the-eye-of-the-beholder/
2•whack•21m ago•0 comments

Show HN: Routed Attention – 75-99% savings by routing between O(N) and O(N²)

https://zenodo.org/records/18518956
1•MikeBee•21m ago•0 comments

We didn't ask for this internet – Ezra Klein show [video]

https://www.youtube.com/shorts/ve02F0gyfjY
1•softwaredoug•22m ago•0 comments

The Real AI Talent War Is for Plumbers and Electricians

https://www.wired.com/story/why-there-arent-enough-electricians-and-plumbers-to-build-ai-data-cen...
2•geox•25m ago•0 comments

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

https://github.com/memovai/mimiclaw
1•ssslvky1•25m ago•0 comments

I Maintain My Blog in the Age of Agents

https://www.jerpint.io/blog/2026-02-07-how-i-maintain-my-blog-in-the-age-of-agents/
3•jerpint•25m ago•0 comments

The Fall of the Nerds

https://www.noahpinion.blog/p/the-fall-of-the-nerds
1•otoolep•27m ago•0 comments

Show HN: I'm 15 and built a free tool for reading ancient texts.

https://the-lexicon-project.netlify.app/
3•breadwithjam•30m ago•1 comments

How close is AI to taking my job?

https://epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job
1•cjbarber•30m ago•0 comments

You are the reason I am not reviewing this PR

https://github.com/NixOS/nixpkgs/pull/479442
2•midzer•32m ago•1 comments

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

https://familymemories.video
1•tareq_•33m ago•0 comments
Open in hackernews

Towards Memory Specialization: A Case for Long-Term and Short-Term RAM

https://arxiv.org/abs/2508.02992
57•PaulHoule•5mo ago

Comments

Animats•5mo ago
What they seem to want is fast-read, slow-write memory. "Primary applications include model weights in ML inference, code pages, hot instruction paths, and relatively static data pages". Is there device physics for cheaper, smaller fast-read slow write memory cells for that?

For "hot instruction paths", caching is already the answer. Not sure about locality of reference for model weights. Do LLMs blow the cache?

toast0•5mo ago
Probably not what they want, but NOR flash is generally directly addressable, it's commonly used to replace mask roms.
bobmcnamara•5mo ago
NOR is usually limited to <30MHz, but if you always want to fetch an entire cacheline, and design the read port, you can fetch the entire cacheline at once so that's pretty neat.

I don't know if anyone has applied this to neutral networks.

kimixa•5mo ago
And does so by being larger, with the only real difference being more area (and gates, and often then critical path length and thus speed) being spent on getting the signaling to the level where it can do word instead of page addressing. The actual flash cells themselves are functionally the same.

There's no fundamental difference in gate technology between the two, so a flash that is addressable to a finder granularity will always be larger than the coarser equivalent. That's the trade off.

bobmcnamara•5mo ago
> Do LLMs blow the cache?

Sometimes very yes?

If you've got 1GB of weights, those are coming through the caches on their way to execution unit somehow.

Many caches are smart enough to recognize these accesses as a strided, streaming, heavily prefetchable, evictable read, and optimize for that.

Many models are now quantized too to reduce the overall the overall memory bandwidth needed for execution, which also helps with caching.

photochemsyn•5mo ago
Yes, this from the paper:

> "The key insight motivating LtRAM is that long data lifetimes and read heavy access patterns allow optimizations that are unsuitable for general purpose memories. Primary applications include model weights in ML inference, code pages, hot instruction paths, and relatively static data pages—workloads that can tolerate higher write costs in exchange for lower read energy and improved cost per bit. This specialization addresses fundamental mismatches in current systems where read intensive data competes for the same resources as frequently modified data."

Essentially I guess they're calling for more specific hardware for LLM tasks, much like was done with all the networking equipment for dedicated packet processing with specialized SRAM/DRAM/TCAM tiers to keep latency to a minimum.

While there's an obvious need for this for traffic flow across the internet, whether or not LLMs are really going to scale like that, or there's a massive AI/LLM bubble about to pop, would be the practical issue, and who knows? The tea leaves are unclear.

gary_0•5mo ago
> device physics for cheaper, smaller

And lower power usage. Datacenters and mobile devices will always want that.

kayson•5mo ago
Cheaper / smaller? I would say not likely. There is already an enormous amount of market pressure to make SRAM and DRAM smaller.

Device physics-wise, you could probably make SRAM faster by dropping the transistor threshold voltage. It would also make it harder / slower to write. The bigger downside is that it would have higher leakage power, but if it's a small portion of all the SRAM, it might be worth the tradeoff.

For DRAM, there isn't as much "device" involved because the storage element isn't transistor-based. You could probably make some design tradeoff in the sense amplifier to reduce read times by trading off write times, but I doubt it would make a significant change.

kimixa•5mo ago
But much of the latency in cache is getting the signal to and from the cell, not the actual store threshold. And I can't see much difference in that unless you can actually eliminate gates (and so make it smaller, making it physically closer on average).
Grosvenor•5mo ago
I'll put the Tandem 5 minute rule paper here, it seems very relevant.

https://dsf.berkeley.edu/cs286/papers/fiveminute-tr1986.pdf

and a revisit of the rule 20 years later (It still held).

https://cs-people.bu.edu/mathan/reading-groups/papers-classi...

staindk•5mo ago
Sounds a bit like Intel's Optane which was seemed great in principle but I never had a use for it.

https://www.intel.com/content/www/us/en/products/details/mem...

https://en.wikipedia.org/wiki/3D_XPoint

esseph•5mo ago
Used a lot with giant SAP HANA systems
dooglius•5mo ago
I'm not seeing the case for adding this to general-purpose CPUs/software. Only a small portion of software is going to be able to be properly annotated to take advantage of this, so it'd be a pointless cost for the rest of users. Normally short-term access can easily become long-term in the tail the process gets preempted by something higher priority or spend a lot of time on an I/O operation. It's also not clear why if you had an efficient solution for the short-term case you wouldn't just add a refresh cycle and use it in place of normal SRAM as generic cache? These make a lot more sense in a dedicated hardware context -- like neural nets -- which I think is the authors' main target here.
gary_0•5mo ago
> Only a small portion of software is going to be able to be properly annotated to take advantage of this

The same could be said for, say, SIMD/vectorization, which 99% of ordinary application code has no use for, but it quietly provides big performance benefits whenever you resample an image, or use a media codec, or display 3D graphics, or run a small AI model on the CPU, etc. There are lots of performance microfeatures like this that may or may not be worth it to include in a system, but just because they are only useful in certain very specific cases does not mean they should be dismissed out of hand. Sometimes the juice is worth the squeeze (and sometimes not, but you can't know for sure unless you put it out into the world and see if people use it).

dooglius•5mo ago
That's fair, I'm implicitly assuming the area cost for this dedicated memory would be much larger than that of e.g. SIMD vector banks.
gary_0•5mo ago
The existence of SIMD has knock-on effects on the design of the execution unit and the FPUs, though, since it's usually the only way to fully utilize them for float/arithmetic workloads. And newer SIMD features like AVX/AVX2 have a pretty big effect on the whole CPU design; it was widely reported that Intel and AMD went to a lot of trouble to make it viable, even though most software probably isn't even compiled with AVX support enabled.

Also SIMD is just one example. Modern DMA controllers are probably another good example but I know less about them (although I did try some weird things with the one in the Raspberry Pi). Or niche OS features like shared memory--pipes are usually all you need, and don't break the multitasking paradigm, but in the few cases where shared memory is needed it speeds things up tremendously.

gizmo686•5mo ago
Presumably, the cost would be roughly the cost of traditional memory. In most consumer devices, memory is bottlenecked by monetary cost, not space or thermal constraints.

However, dedicate read-optimized memory would be instead of a comparable amount of general purpose memory, as data stored in one need not be stored in the other. The only increase in memory used would be what is necessary to account for fragmentation overhead when your actual usage ratio differs from what the architect assumed. Even then, the OS could use the more plentiful form of memory as swap-space for more in demand form (or, just have low priority memory regions used the less optimal form). This will open up a new and exciting class of resource management problems for kernel developers to eek out a few extra percentage points of performance.

gizmo686•5mo ago
A bunch of applications should be able to annotate data as read-heavy. Without any change of application code, operating systems can assume that pages mapped read-only should be mapped. This imidietly gives you the majority of executable code and all data files that are mmapped as read only.

I'm not sure how good applications are at properly annotating it, but for most applications assets are also effectively read only.

You don't even need most of the ram usage to be able to take advantage of this. If you can reasonably predict what portion of ram usage will be heuristically read-heavy, then you can allocate your ram budget accordingly, and probably eak out a measurable performance improvement. In a world with Moore's law, this type of heterogeneous architecture has proven to not really be worth it. However that calculus chagnes once we lose the ability to throw more transistors at the problem.

laserbeam•5mo ago
I imagine it would be straightforward to support this for codebases which already define multiple allocators (game engines, programs written in zig, bunch of other examples I’m less familiar with). If you’re already in memory management land you already you already have multiple implementations of malloc and free. Adding more of them is trivial.

If you’re not in manual memory management land, then you probably don’t care about this optimization just like you barely think of stack vs heap. Maybe the compiler could guess something for you, but I wouldn’t be worrying about it in that problem space.

dooglius•5mo ago
I'm totally in manual memory management land. But it's very difficult for me to think of a case where time-limited retention is something I'd feel safe with, and with limited-endurance memory I'd worry about wearing it out with iterating or debugging.
meling•5mo ago
Are there new physics on the horizon that could pave the way for new memory technologies?
addaon•5mo ago
In the microcontroller world, there's already asymmetric RAM like this, although it's all based on the same (SRAM) technology, and the distinction is around the topology. You have TCM directly coupled to the core, then you generally have a few SRAM blocks attached to an AXI cross-bar (so that if software running on different µc cores don't simultaneously access the same block, you have non-interference on timing; but simultaneous access is allowed at the cost of known timing), and then a few more SRAM blocks attached a couple of AXI bridges away (from the point of view of a core; for example, closer to a DMA engine, or a low power core, or another peripheral that masters the bus). You can choose to ignore this, but for maximum performance and (more importantly) maximum timing determinism, understanding what is in which block is key. And that's without getting into EMIFs and off-chip SRAM and DRAM, or XIP out of various NVM technologies...
pfdietz•5mo ago
Wouldn't a generational garbage collector automatically separate objects into appropriate lifetime categories?
gizmo686•5mo ago
Garbage collectors typically do not differentiate live-and-mutated from live-but-unmutated, which is what is needed here.
pfdietz•5mo ago
Generational collectors need to record when older generation objects are modified (by card marking, for example), so they can distinguish mutated from unmutated.
masklinn•5mo ago
Afaik card marking and other similar schemes do not care about (or track) mutated objects. They track cross-generational references, which have to have been caused by mutation but only a very selective subset thereof. And card marking does not even track at the object level, it tracks pages which have at some point contained a pointer to the newgen.
imtringued•5mo ago
I don't know what the point of these fantasy computer papers are if there is no hardware implementation or even just a design of their concepts? Even managed retention memory is not a thing yet, so what's the point of all of this?