frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

How LLMs work

https://www.0xkato.xyz/how-llms-actually-work/
265•0xkato•2d ago•75 comments

The intracies of modern camera lens repair (2024)

https://salvagedcircuitry.com/sigma-45mm.html
155•transistor-man•7h ago•51 comments

S&P 500 rejects SpaceX, also blocking entry for OpenAI and Anthropic

https://arstechnica.com/tech-policy/2026/06/sp-500-blocks-fast-spacex-entry-wont-waive-rule-for-u...
318•maltalex•3h ago•94 comments

Pre-Modern Armies for Worldbuilders, Part I: Why They Fight

https://acoup.blog/2026/06/05/collections-pre-modern-armies-for-worldbuilders-part-i-why-they-fight/
59•gostsamo•4h ago•16 comments

Astronauts told to return to ISS after sheltering over air leak repairs

https://www.bbc.com/news/live/c4g44ew3g1kt
387•janpot•17h ago•249 comments

New method turns ocean water into drinking water, without waste

https://www.rochester.edu/newscenter/what-is-desalination-definition-ocean-water-704732/
346•speckx•17h ago•150 comments

The back cover of C++: The Language raises questions not answered by front cover

https://devblogs.microsoft.com/oldnewthing/20260605-01/?p=112391
79•paulmooreparks•4h ago•21 comments

pg_durable: Microsoft open sources in-database durable execution

https://github.com/microsoft/pg_durable
387•coffeemug•16h ago•88 comments

Ask HN: What was your "oh shit" moment with GenAI?

313•andrehacker•1d ago•572 comments

Ten Years of Franz

https://meetfranz.com/blog/ten-years-of-franz
27•tosh•3d ago•14 comments

Social Cache Busting

https://www.autodidacts.io/social-cache-busting/
14•surprisetalk•3d ago•2 comments

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gem...
342•theanonymousone•16h ago•105 comments

Lockdown Mode

https://help.openai.com/en/articles/20001061-lockdown-mode
50•berlianta•4h ago•23 comments

Did Claude increase bugs in rsync?

https://alexispurslane.github.io/rsync-analysis/
397•logicprog•19h ago•401 comments

Mouseless – keyboard-driven control of macOS/Linux/Windows

https://mouseless.click
525•riddley•2d ago•213 comments

No Let, No Rec, No Problem: A Gentler Introduction to the Y and Z Combinators

https://irfanali.org/blog/zcom
38•sayyadirfanali•3d ago•6 comments

The perils of UUID primary keys in SQLite

https://andersmurphy.com/2026/06/05/the-perils-of-uuid-primary-keys-in-sqlite.html
79•emschwartz•9h ago•43 comments

My Agent Skill for Test-Driven Development

https://www.saturnci.com/my-agent-skill-for-test-driven-development.html
170•laxmena•1d ago•72 comments

We shrank our TimescaleDB chunks from 30 days to 7

https://tech.wmg.com/why-we-shrank-our-timescaledb-chunks-from-30-days-to-7-07cab8afefc5
4•yask123•2d ago•0 comments

Gov.uk has replaced Stripe with Dutch provider Adyen

https://www.theregister.com/public-sector/2026/06/04/govuk-goes-dutch-on-payments-as-it-dumps-str...
427•toomuchtodo•15h ago•149 comments

Nine Ways to Do Inheritance in Rust, a Language Without Inheritance

https://medium.com/@carlmkadie/nine-ways-to-do-inheritance-in-rust-a-language-without-inheritance...
37•pjmlp•2d ago•5 comments

Conventional Commits encourages focus on the wrong things

https://sumnerevans.com/posts/software-engineering/stop-using-conventional-commits/
298•jsve•16h ago•231 comments

The Quiet Numbers Station: Decoding Nineteen Years of GPS Cryptography

https://www.benthamsgaze.org/2026/06/02/the-quiet-numbers-station-decoding-nineteen-years-of-gps-...
85•lordgilman•19h ago•69 comments

Ask HN: Why is the HN crowd so anti-AI?

127•Ekami•5h ago•235 comments

Tracing a powerful GNSS interference source over Europe

https://arxiv.org/abs/2606.03673
391•mimorigasaka•23h ago•201 comments

Europe's largest Copper Age tomb: children's bones show ancient health crisis

https://phys.org/news/2026-05-europe-largest-copper-age-tomb.html
29•gmays•1d ago•5 comments

Transformers are inherently succinct

https://openreview.net/pdf?id=Yxz92UuPLQ
116•brandonb•13h ago•32 comments

India's surprise baby bust

https://www.economist.com/leaders/2026/06/04/indias-surprise-baby-bust-is-a-warning-to-the-world
175•hakonbogen•17h ago•752 comments

Three of our worst VC stories

https://twitter.com/eastdakota/status/2062860530360959273
225•orgonon•13h ago•111 comments

Cooldown Support for Ruby Bundler

https://blog.rubygems.org/2026/06/03/cooldown-let-new-gems-be-vetted.html
157•calyhre•3d ago•42 comments
Open in hackernews

EM-LLM: Human-Inspired Episodic Memory for Infinite Context LLMs

https://github.com/em-llm/EM-LLM-model
113•jbotz•1y ago

Comments

MacsHeadroom•1y ago
So, infinite context length by making it compute bound instead of memory bound. Curious how much longer this takes to run and when it makes sense to use vs RAG.
zfountas•1y ago
Hi MacsHeadroom, first author here. Thanks for the great questions about compute/memory trade-offs.

The quick take: To give you an example of processing speed, with a 7B model on an NVIDIA V100, EM-LLM processes (or generates) about 326 tokens/sec with a 51.2K context window (which is quite competitive for these old GPUs).

More broadly, EM-LLM is designed to make ultra-long contexts (memory-prohibitive for standard O(n^2) attention) computationally tractable. The Appendix C of our paper https://openreview.net/pdf?id=BI2int5SAC details how: significantly better attention scaling, efficient O(nm) memory formation, and large KV cache management via CPU/disk offloading. While there's a slight per-chunk overhead compared to the simplest retrieval methods initially, the crucial part is our ability to handle sequences at scales infeasible for full-context models. For instance, we're successfully using 8B models with 10M token contexts on a single GPU without prohibitive delays.

Regarding RAG in particular, EM-LLM often shows significant gains on tasks needing deep understanding of a single, long, coherent context. A key reason is that EM-LLM allows each layer to retrieve and integrate relevant information from different "episodes" of the context independently, offering more nuance than a typical single RAG step, for similar overall resource use.

mountainriver•1y ago
TTT, cannon layers, and titans seem like a stronger approach IMO.

Information needs to be compressed into latent space or it becomes computationally intractable

searchguy•1y ago
do you have references to

> TTT, cannon layers, and titans

najarvg•1y ago
This was the nearest reference I could find. Links to an unofficial pytorch implementation on Github are also linked in the threads somewhere - https://www.reddit.com/r/LocalLLaMA/comments/1i0q8nw/titans_...
vessenes•1y ago
is titans replicated? I feel like lucidrains couldn't replicate.
logicchains•1y ago
I think something like Titans explains Gemini's excellent long context performance. That would explain why the Titan team hasn't released the training code or hyperpameters used even though they said in the paper that they would, and why soon after that it came out that DeepMind would be holding off publishing new results for 6 months to avoid giving away competitive advantages.
p_v_doom•1y ago
Interesting. Before there even was attention I was thinking that the episodic memory model offers something that could be very useful for neural nets, so its cool to see people testing that
killerstorm•1y ago
Note that this works within a single sequence of tokens. It might be consistent with "episodic memory" metaphor if we consider a particular transformer run as its experience.

But this might be very different from what people expect from "memory" - i.e. ability to learn vast amounts of information and retrieve it as necessary.

This is more like a refinement of transformer attention: instead of running attention over all tokens (which is very expensive as it's quadratic), it selects a subset of token spans and runs fine-grained attention only on those. So it essentially breaks transformer attention into two parts - coarse-grained (k-NN over token spans) and fine-grained (normal).

It might be a great thing for long-context situations. But it doesn't make sense when you want millions of different facts to be considered - making them into long context is rather inefficient.

yorwba•1y ago
It would be inefficient if you had to do it from scratch for every query, but if you can do it once as a preprocessing step and reuse the prepared context for many queries, it might start to become more efficient than a shorter context that includes only some documents but has to be reprocessed because it's different every time.
killerstorm•1y ago
Yes, I think it might be a good solution where you have a context up to 10M of tokens and you do a lot of requests with that context. It might be relevant for agentic stuff which tends to produce long chat logs - especially with some gadgets on top, e.g. some 'episodes' might be completely removed as obsolete.

But I don't think it's a good solution for bigger amounts of data - as in that case it's more beneficial if that can be formed into independent memories.