frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Road to a Billion-Token Context

https://cacm.acm.org/news/the-road-to-a-billion-token-context/
11•pseudolus•2d ago

Comments

schnitzelstoat•1h ago
Is such a large context window even desirable? It seems like it would consume an awful lot of tokens and, unless one was very careful to curate the context, could even result in worse performance.
withinboredom•1h ago
For larger codebases ... maybe it will cut down on "let me create a random number wrapper for the 15th time" type problems.
Weryj•56m ago
You should already have skills which mention these utilities.

But maybe that’s enough tokens to feed an entire lifetime of user behaviour in for the digital twin dystopia?

withinboredom•53m ago
"type problems" was doing the heavy lifting there, not literally "this utility".
AureliusMA•52m ago
I remember when a large context was 8k! Nowadays that would seem extremely small, because we have new use-cases that require much larger context sizes. Maybe in the future, we will invent ways to use inference on very large contexts that we cannot even imagine today.
faangguyindia•27m ago
imagine if you were making a database software and u could fit source code of all existing databases and their github issues in context.
AntiUSAbah•25m ago
Thats either the R&D part of this chip or Nvidia has the use case.

Nvidia uses ML for finetuning and architecturing their chips. this might be one use case.

Another one would be to put EVERYTHING from your company into this context window. It would be easier to create 'THE' model for every company or person. It might also be saver than having a model train with your data because you don't have a model with all your data, only memory.

__alexs•1h ago
Does having 1 billion tokens mean more total tokens in the context window are actually good quality, or do we just get more dumb tokens?
RugnirViking•1h ago
the article is almost entirely about this, yes.

Current approaches require fancy tricks to fit tokens into memory, and spread attention thinner over larger numbers of tokens. The new approach tries to find a way to keep everything in a single shared memory, and process the tokens in parallel using multiple GPUs

AureliusMA•59m ago
How large would a 1 billion token kv even be ?!
AntiUSAbah•23m ago
30TB for 4 bit, 60tb for 8bit res
Schlagbohrer•53m ago
What does this mean: "In addition, because most AI models are not trained uniformly across their maximum context length, their reasoning quality tends to degrade gradually near the limit rather than fail abruptly."

Models aren't trained across their context, their context is their short term memory at runtime, right? Nothing to do with training. They are trained on a static dataset.

andai•43m ago
Not sure how it is now, but a while back most of the training data was short interactions.

I noticed that the longer a chat gets, the more unpredictable the models behavior becomes (and I think that's still a common jailbreak technique too).

(I think it might also have something to do with RoPE, but that's beyond me.)

AntiUSAbah•26m ago
So for the context to work well, you need some attention mechanism which makes sure that details are not getting lost due to context amount.

or lets say it differently: The LLM gets trained on static data but also on the capability of handling context in itself.

Kimi introduced this https://github.com/MoonshotAI/Attention-Residuals but i'm pretty sure closed labs like Google had something like this for a while.

yorwba•21m ago
The attention residuals paper uses attention across layers for the same token, in addition to the usual case of attention across tokens within the same layer, but it doesn't do anything to address the "lost in too much context" problem. At least the number of layers is currently still low enough that there's probably no equivalent "lost in too many layers" problem yet.
smallerize•25m ago
I think it means most of the training data is short. And a lot of the long-context examples are conversations where the middle turns are less important.
Schlagbohrer•51m ago
Amazing that they are trying to solve this with hardware rather than with a new software architecture but I suppose the current technology underlying LLM software must be far and away the best theoretically or most established, or the time taken to seek a new model isn't worth it for the big companies.

I know Yann LeCun is trying to do a completely different architecture and I think that's expected to take 2-3 years before showing commercial results, right? Is that why they're finding it quicker to change the hardware?

AntiUSAbah•38m ago
Nvidia has so much money, it would be a waste if they wouldn't attack current problems on multiply points at once.

People, Researcher, Investor etc. probably also want to see what would be possible and someone has to do it.

I can also imagine, that an inferencing optimized system like this could split the context for different requests if it doesn't need to use the full context.

Could also be that they have internal use cases which require this amount of context.

GameStop makes $55.5B takeover offer for eBay

https://www.bbc.co.uk/news/articles/cn0p8yled1do
131•n1b0m•1h ago•85 comments

ASML's Best Selling Product Isn't What You Think It Is

https://www.siliconimist.com/p/asmls-best-selling-product
7•johncole•17m ago•3 comments

Trademark violation: Fake Notepad++ for Mac

https://notepad-plus-plus.org/news/npp-trademark-infringement/
185•maxloh•1h ago•64 comments

Debunking the CIA's “magic” heartbeat sensor [video]

https://www.youtube.com/watch?v=SVTPv4sI_Jc
39•areoform•11h ago•32 comments

Using “underdrawings” for accurate text and numbers

https://samcollins.blog/underdrawings/
264•samcollins•2d ago•86 comments

Texico: Learn the principles of programming without even touching a computer

https://www3.nhk.or.jp/nhkworld/en/shows/texico/
61•o4c•2d ago•3 comments

BYOMesh – New LoRa mesh radio offers 100x the bandwidth

https://partyon.xyz/@nullagent/116499715071759135
391•nullagent•17h ago•126 comments

DeepClaude – Claude Code agent loop with DeepSeek V4 Pro

https://github.com/aattaran/deepclaude
496•alattaran•13h ago•198 comments

Discovering hard disk physical geometry through microbenchmarking (2019)

https://blog.stuffedcow.net/2019/09/hard-disk-geometry-microbenchmarking/
101•TapamN•3d ago•5 comments

World's biggest RC A380 [video]

https://www.youtube.com/watch?v=wr9YLGbhxng
11•NaOH•1d ago•0 comments

A treasure trove of fossils rewrites the story of early life

https://www.quantamagazine.org/a-treasure-trove-of-cambrian-fossils-rewrites-the-story-of-early-l...
41•worldvoyageur•2d ago•6 comments

Stitch together lots of little HTML pages with navigations for interactions

https://blog.jim-nielsen.com/2026/small-html-pages/
60•OuterVale•6h ago•31 comments

Southwest Headquarters Tour

https://katherinemichel.github.io/blog/travel/southwest-headquarters-tour-2026.html
259•KatiMichel•18h ago•80 comments

The 'Hidden' Costs of Great Abstractions

https://jdgr.net/the-hidden-costs-of-great-abstractions
181•jdgr•12h ago•79 comments

Let's Buy Spirit Air

https://letsbuyspiritair.com/
370•bjhess•11h ago•351 comments

Over 8M Thermos jars and bottles recalled after 3 people lost vision

https://www.goodmorningamerica.com/living/story/8-million-thermos-jars-bottles-recalled-after-3-1...
64•taubek•2h ago•47 comments

A desktop made for one

https://isene.org/2026/05/Audience-of-One.html
359•xngbuilds•19h ago•193 comments

US–Indian space mission maps extreme subsidence in Mexico City

https://phys.org/news/2026-04-usindian-space-mission-extreme-subsidence.html
160•leopoldj•2d ago•62 comments

The Road to a Billion-Token Context

https://cacm.acm.org/news/the-road-to-a-billion-token-context/
11•pseudolus•2d ago•18 comments

K3sup – bootstrap K3s over SSH in < 60s

https://github.com/alexellis/k3sup
59•rickcarlino•2d ago•22 comments

United flight collides with truck and light pole as it lands at Newark airport

https://www.cnn.com/2026/05/03/us/united-airlines-newark-truck-streetlight
14•blendergeek•1h ago•1 comments

OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors

https://www.theguardian.com/technology/2026/apr/30/ai-outperforms-doctors-in-harvard-trial-of-eme...
406•donsupreme•1d ago•363 comments

Introduction to Atom

https://validator.w3.org/feed/docs/atom.html
102•susam•13h ago•36 comments

Bad Connection: Global telecom exploitation by covert surveillance actors

https://citizenlab.ca/research/uncovering-global-telecom-exploitation-by-covert-surveillance-actors/
158•miohtama•19h ago•13 comments

Fun with polynomials and linear algebra; or, slight abstract nonsense

https://guille.site/posts/abstract-nonsense/
22•LolWolf•2d ago•0 comments

Humanoid Robot Actuators

https://www.firgelli.com/pages/humanoid-robot-actuators
137•ofrzeta•7h ago•59 comments

Tar Files Created on macOS Display Errors When Extracting on Linux (2024)

https://aruljohn.com/blog/macos-created-tar-files-linux-errors/
113•heresie-dabord•3d ago•80 comments

New statue in London, attributed to Banksy, of a suited man, blinded by a flag

https://www.smithsonianmag.com/smart-news/attributed-to-banksy-a-new-statue-of-a-suited-man-blind...
421•dryadin•16h ago•383 comments

Mercedes-Benz commits to bringing back physical buttons

https://www.drive.com.au/news/mercedes-benz-commits-to-bringing-back-phycial-buttons/
728•teleforce•20h ago•404 comments

Text-to-CAD

https://github.com/earthtojake/text-to-cad
123•softservo•3d ago•33 comments