frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•1m ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
1•helloplanets•4m ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•12m ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•13m ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•15m ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•15m ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
1•basilikum•18m ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•18m ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•23m ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
3•throwaw12•25m ago•1 comments

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•25m ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•25m ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•27m ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•31m ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
2•andreabat•33m ago•1 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
2•mgh2•39m ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•41m ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•46m ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•48m ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
1•lifeisstillgood•48m ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•51m ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•52m ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
2•birdculture•54m ago•0 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•56m ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
2•ramenbytes•58m ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•59m ago•0 comments

Ed Zitron: The Hater's Guide to Microsoft

https://bsky.app/profile/edzitron.com/post/3me7ibeym2c2n
2•vintagedave•1h ago•1 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
1•__natty__•1h ago•0 comments

Show HN: Android-based audio player for seniors – Homer Audio Player

https://homeraudioplayer.app
3•cinusek•1h ago•2 comments

Starter Template for Ory Kratos

https://github.com/Samuelk0nrad/docker-ory
1•samuel_0xK•1h ago•0 comments
Open in hackernews

Context Rot: How increasing input tokens impacts LLM performance

https://research.trychroma.com/context-rot
260•kellyhongsn•6mo ago
I work on research at Chroma, and I just published our latest technical report on context rot.

TLDR: Model performance is non-uniform across context lengths, including state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models.

This highlights the need for context engineering. Whether relevant information is present in a model’s context is not all that matters; what matters more is how that information is presented.

Here is the complete open-source codebase to replicate our results: https://github.com/chroma-core/context-rot

Comments

tjkrusinski•6mo ago
Interesting report. Are there recommended sizes for different models? How do I know what works or doesn't for my use case?
posnet•6mo ago
I've definitely noticed this anecdotally.

Especially with Gemini Pro when providing long form textual references, providing many documents in a single context windows gives worse answers than having it summarize documents first, ask a question about the summary only, then provide the full text of the sub-documents on request (rag style or just simple agent loop).

Similarly I've personally noticed that Claude Code with Opus or Sonnet gets worse the more compactions happen, it's unclear to me whether it's just the summary gets worse, or if its the context window having a higher percentage of less relevant data, but even clearing the context and asking it to re-read the relevant files (even if they were mentioned and summarized in the compaction) gives better results.

tough•6mo ago
Have you tried NotebookLM which basically does this as an app on the bg (chunking and summarising many docs) and you can -chat- with the full corpus using RAG
zwaps•6mo ago
Gemini loses coherence and reasoning ability well before the chat hits the context limitations, and according to this report, it is the best model on several dimensions.

Long story short: Context engineering is still king, RAG is not dead

risyachka•6mo ago
Yep. The easiest way to tell someone has no experience with LLMs is if they say “RAG is dead”
apwell23•6mo ago
> someone has no experience with LLMs

Thats 99% of coders. No need to gatekeep.

tvshtr•6mo ago
Yep, it can decohere really badly with bigger context. It's not only context related though. Sometimes it can lose focus early on in a way that is impossible to get it back on track.
deadbabe•6mo ago
RAG was never going away, the people who say that are the same types who say software engineers will be totally replaced with AI.

LLMs will need RAG one way or another, you can hide it from the user, but it still must be there.

Inviz•6mo ago
Cursor lifted "Start a new chat" limitation on gemini and i'm actually now enjoying keeping longer sessions within one window, becuase it's still very reasonable at recall, but doesnt need to restate everything each time
Xmd5a•6mo ago
Gemini loses the notion of context the longer its context is: I often ask it to provide a summary of our discussion for the outside world and it will reference ideas or documents without introducing them, via anaphore, as if the outside world had knowledge of the context.
darepublic•6mo ago
Can you elaborate on how prompts enhanced with rag avoid this context pollution? I don't understand why that would be
bayesianbot•6mo ago
I feel like the optimal coding agent would do this automatically - collect and (sometimes) summarize the required parts of code, MCP responses, repo maps etc., then combine the results into a new message in a new 'chat' that would contain all the required parts and nothing else. It's basically what I already do with aider, and I feel the performance (in situations with a lot of context) is way better than any agentic / more automated workflow I've tried so far, but it is a lot of work.
OccamsMirror•6mo ago
Claude Code tries, and it seems to be OK at it. It's hard to tell though and it definitely feels like sometimes you absolutely have to quit out and start again.
doctorhandshake•6mo ago
Try using /clear instead of quitting. Doesn’t clear scrollback buffer but does clear context
gonzric1•6mo ago
Appmap's ai agent does this very well.
irskep•6mo ago
"Compactions" are just reducing the transcript to a summary of the transcript, right? So it makes sense that it would get worse because the agent is literally losing information, but it wouldn't be due to context rot.

The thing that would signal context rot is when you approach the auto-compact threshold. Am I thinking about this right?

0x457•6mo ago
Yes, but on agentic workflows it's possible to do more intelligent compaction.
zwaps•6mo ago
Very cool results, very comprehensive article, many insights!

Media literacy disclaimer: Chroma is a vectorDB company.

philip1209•6mo ago
Chroma does vector, full-text, and regex search. And, it's designed for multitenant workloads typical of AI applications. So, not just a "vectorDB company"
firejake308•6mo ago
yeah, but they benefit from convincing people not to dump everything in context, because the alternative is to dump everything in a db (like Chroma) and then retrieve only the relevant parts (whether that's using vector search or regex search or full-text search or whatever). I still think their thesis is correct, but readers should be aware of the author's bias and make their own judgment.
tough•6mo ago
this felt intuitively true, great to see some research putting hard numbers on that
lukev•6mo ago
This effect is well known but not well documented so far, so great job here.

It's actually even more significant than it's possible to benchmark easily (though I'm glad this paper has done so.)

Truly useful LLM applications live at the boundaries of what the model can do. That is, attending to some aspect of the context that might be several logical "hops" away from the actual question or task.

I suspect that the context rot problem gets much worse for these more complex tasks... in fact, exponentially so for each logical "hop" which is required to answer successfully. Each hop compounds the "attention difficulty" which is increased by long/distracting contexts.

magicalhippo•6mo ago
Is this due to lack of specific long-context training, or is it more limitations of encoding or similar?

I've noticed this issue as well with smaller local models that have relatively long contexts, say a 8B model with 128k context.

I imagined they performed special recall training for these long context models, but the results seem... not so great.

jpcompartir•6mo ago
Good question, I was wondering the same.

My hunch would be that even if we had a lot more annotated examples of reasoning and retrieval over 10,000+ tokens, the architectures we have today would still be limited.

namibj•6mo ago
It's inherent, see https://arxiv.org/abs/2002.07028 (as I detailed in my sibling comment to yours just now, but before I saw yours). That said, there are architecture sizing ways that allow much better long-context performance at the cost of some short-context performance for a given parameter count and inference compute budget.
magicalhippo•6mo ago
Much appreciated, will read the paper tonight.

Having a LLM recall something with exact detail some 100k tokens ago sounds a bit like the ADHD test Cartman got in South Park. We don't recall exactly but rather a summarized version.

On the other hand, computers recall exactly when asked directly (RAM access) so in that sense it seems natural to want that from a LLM.

One thing we can do which current LLMs can't, at least directly as far as I'm aware, is to go back and re-read a section. Like on-demand RAG, or something.

In the meantime, good to know it's not terribly useful to have the full 128k context, as it usually is too much for my GPU anyway.

namibj•6mo ago
> One thing we can do which current LLMs can't, at least directly as far as I'm aware, is to go back and re-read a section. Like on-demand RAG, or something.

Encoders can do that. And we can use them with diffusion to generate text [0].

This works because you don't impose a masked self attention for autoregressive decoding in the encoder, so subsequent layers can re-focus their key/query vector space to steer "backwards" information flow.

Happy reading. Feel free to get back!

[0]: https://arxiv.org/abs/2211.15029

lifthrasiir•6mo ago
I recently wrote several novels using Gemini 2.5 Flash and the context rot is noticable but happens far later than what this report implies. In my experience, 50K to 100K tokens were required for it to start to disregard the initial context (e.g. the output language). Maybe a complex task like creative writing makes the impact harder to measure or observe; in any case it remained okay enough for me, once I supplied missing contexts from time to time.
elevaet•6mo ago
Let's hear about these novels - are they good? Are you publishing them?
lifthrasiir•6mo ago
If you are interested, one of the novels is in fact open to the public: https://w.mearie.org/maidens/. But it's written in Korean and no English version is available yet.
Workaccount2•6mo ago
What's really needed is a way to easily prune context. If I could go and manually manage the entire chat with a model, I could squeeze way more juice out of a typical ~200k token coding session.

Instead I have a good instance going, but the model fumbles for 20k tokens and then that session heavily rotted. Let me cut it out!

aaronblohowiak•6mo ago
Even just a rollback to previous checkpoint would be killer frsture
sevenseacat•6mo ago
Zed's agent mode lets you do this, don't know about others
t55•6mo ago
that's a standard feature in cursor, windsurf, etc.
lordswork•6mo ago
/compress is the command to do this in most cli agents
sevenseacat•6mo ago
That will reduce the context to a summary, not prune a bunch of irrelevant stuff
snickerdoodle12•6mo ago
Local LLMs let you edit the context however you want, including the responses generated by the LLM so it will later think it said what you want it to say which can help put it on the right track.

LLMs-as-a-service don't offer this because it makes it trivial to bypass their censoring.

chrisweekly•6mo ago
I've heard it repeated so many times that once things start to go sideways, trying to get back on track is a mistake. Have you had real-world success hacking context using rewritten responses?
steveklabnik•6mo ago
I have experimented with "hey claude i am about to reset your context, please give me a prompt that will allow you to continue your work" and then reviewing that and tweaking it before feeding it back in.
jsemrau•6mo ago
Once you are working with local LLMs you quickly run into CUDA Out of Memory error. Managing input context input sizes in prompts is really critical. Also keeps cost down.
kbelder•6mo ago
If you're working with local LLMs, why do you care about cost?
jsemrau•6mo ago
You can use a lower-end GPU (like the RTX 3060), which also uses less energy. But you are right, you won't be encountering model API costs when running it locally.
blixt•6mo ago
This is one type of problem of information retrieval, but I think the change in performance with context length may be different for non-retrieval answers (such as “what is the edited code for making this button red?” or “which of the above categories does the sentence ‘…’ fall under?”).

One paper that stood out to me a while back was Many-Shot In-Context Learning[1] which showed large positive jumps in performance from filling the context with examples.

As always, it’s important to test one’s problem to know how the LLM changes in behavior for different context contents/lengths — I wouldn’t assume a longer context is always worse.

[1] https://arxiv.org/pdf/2404.11018

orbital-decay•6mo ago
My intuition is that questions that require reasoning always perform worse than direct retrieval questions, without exceptions. Especially when it's about negatives or when distractors are present. You're right though, intuition is not measuring, some relevant numbers would be nice to see.

ICL is a phenomenon separate from long-context performance degradation, they can coexist, similarly to how lost-in-the-middle affects the performance of examples in different positions just as fine.

blixt•6mo ago
Yeah ultimately it depends on the problem. Reading an article like this, it's easy to conclude that the context should always be reduced, all context relegated to a vector database[1], and retrieved on demand such that the context is as small as possible. Seeing it makes me want to refer to situations where conversely growing the context helps a lot to improve performance.

It really depends on the task, but I imagine most real world scenarios have a mixed bag of requirements, such that it's not a needle-in-a-haystack problem, but closer to ICL. Even memory retrieval (an example given in the post) can be tricky because you cannot always trust cosine similarity on short text snippets to cleanly map to relevant memories, and so you may end up omitting good data and including bad data (which heavily skews the LLM the wrong way).

[1]: Coincidentally what the post author is selling

msgodel•6mo ago
I always disable reasoning when I can. It got over hyped because of deepseek when the short one sentence chain of thought most conversational models were trained to do seemed to be enough.
orbital-decay•6mo ago
That's not what I mean. "Questions that require reasoning", i.e. indirect questions that require picking a fact in the context and processing it somehow, not necessarily related to reasoning chains models natively trained to do. Something GP is talking about.

Built-in reasoning chain certainly helps in long-context tasks, especially when it's largely trained to summarize the context and deconstruct the problem, like in Gemini 2.5 (you can easily jailbreak it to see the native reasoning chain that is normally hidden between system delimiters) and DeepSeek R1-0528, or when you're forcing it to summarize with a custom prompt/prefill. The article seems to agree.

milchek•6mo ago
Anecdotally, my experience has been that the longer a conversation goes on in Cursor about a new feature or code change, the worse the output gets.

The best results seem to be from clear, explicit instructions and plan up front for a discrete change or feature, with the relevant files to edit dragged into the context prompt.

elmean•6mo ago
Agreed, The flow of Explore -> plan -> code -> test -> commit. Has made things better with clearing the context between steps if it makes sense
chrisweekly•6mo ago
I liked this blog post that underscores the benefits of creating an explicit plan or "specs", up front:

https://lukebechtel.com/blog/vibe-speccing

0x457•6mo ago
Yeah, that's why I often save context once there is enough information for work to be done. Then, once I notice regression in quality, I do a summary of work done (still could be a low quality) and add it on top of previous checkpoint.
jgalt212•6mo ago
The industry will fight context rot mitigation efforts. Smaller context windows means less need for 1000s of GPUs. Less need for hyperscalers. The up and to the right narrative falls apart.
namibj•6mo ago
On this note I want to point at "Low-Rank Bottleneck in Multi-head Attention Models", which details how attention inherently needs the query dimension to match or exceed the sequence length to allow precise (and especially, sharp) targeting.

It may be that dimension-starved pretrained transformer models rely heavily on context being correctly "tagged" in all relevant aspects the very instant it's inserted into the KV cache, e.g. necessitating negation to be prefixed to a fact instead of allowing post-fix negation. The common LLM chat case is telling the model it just spewed hallucination/wrong claims, and hoping this will help instead of hurt downstream performance as the chat continues. There specifically the negation is very delayed, and thus not present in most tokens that code the hallucinated claims in the KV cache, and thus for lack of sufficient positional precision due to insufficient dimensionality, the transformer can't retroactively attribute the "that was wrong" claim in a retrievable matter to the hallucination tokens.

The result of course being the behavior we experience: hallucinations are corrected by editing the message that triggered them to include discouraging words, as otherwise the thread will become near-useless from the hallucination context pollution.

I do wonder if we have maybe figured out how to do this more scalable than just naively raising the query dimension to get (back?) closer to sequence length.

[0]: https://arxiv.org/abs/2002.07028

mikeve•6mo ago
I've experienced this as well. I'm working on a project for which I wanted to search through transcripts of a video. This is often a very long text. I figured since models like the GPT 4.1 series have very large context windows RAG was not needed but I definitely notice some strange issues, especially on the smaller models. Things like not answering the question that was asked but returning a generic summary of the content.
psadri•6mo ago
I have collected some of the techniques I have developed/used in reducing LLM context size:

https://www.notion.so/LLM-Context-Engineering-21b814d6a64980...

Some of these are in use in an in-house AI chat application that has a heavy emphasis on tool calls.

boesboes•6mo ago
Claude code looses the ability to distinguish between it's own mistakes and my instructions. Once it gets confused, start over. The longer the sessions, the more it starts to go in loops or just decides that the test was already broken (despite it breaking it in this session) and that it will just ignore it.

I'm sure it's all my poor prompting and context, but it really seems like Claude has lost 30 iq points last few weeks.

SketchySeaBeast•6mo ago
> I'm sure it's all my poor prompting and context,

Does this not feel like gaslighting we've all now internalized?

vevoe•6mo ago
No I feel the same way too. I'm on the max plan and I swear it has good days and bad days.
kelsey98765431•6mo ago
free hint: a model can be trained to prune or clean up context in a multi shot conversation. the final amount of removed tokens plus the final verifiable reward is itself a verifiable signal. cheers