Context Rot: How increasing input tokens impacts LLM performance

https://research.trychroma.com/context-rot

27•kellyhongsn•3h ago

I work on research at Chroma, and I just published our latest technical report on context rot.

TLDR: Model performance is non-uniform across context lengths, including state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models.

This highlights the need for context engineering. Whether relevant information is present in a model’s context is not all that matters; what matters more is how that information is presented.

Here is the complete open-source codebase to replicate our results: https://github.com/chroma-core/context-rot

Comments

tjkrusinski•1h ago

Interesting report. Are there recommended sizes for different models? How do I know what works or doesn't for my use case?

posnet•15m ago

I've definitely noticed this anecdotally.

Especially with Gemini Pro when providing long form textual references, providing many documents in a single context windows gives worse answers than having it summarize documents first, ask a question about the summary only, then provide the full text of the sub-documents on request (rag style or just simple agent loop).

Similarly I've personally noticed that Claude Code with Opus or Sonnet gets worse the more compactions happen, it's unclear to me whether it's just the summary gets worse, or if its the context window having a higher percentage of less relevant data, but even clearing the context and asking it to re-read the relevant files (even if they were mentioned and summarized in the compaction) gives better results.

Show HN: Generate any workflow with natural language

Show HN: ZeroFS: The S3FS that does not suck

AskIt MCP – Apache 2.0

Show HN: A simple iOS-native map measurement app

Tresorit – secure file exchange and collaboration made easy

DEWLine Museum – The Distant Early Warning Radar Line

Launchk: Rust/Cursive TUI for looking at macOS launchd agents and daemons

Ask HN: Have you noticed AI critic content being disparaged on HN?

Researchers Develop New Tool to Measure Biological Age

I've created a tool that is saving me many hours watching YouTube

1.1.1.1 Is Down

Scientists detect light passing through human head for brain imaging

Code highlighting with Cursor AI for $500k

Microsoft Surface parody (2007) [video]

Texas AG requests Robert Roberson be executed Oct. 16

DHH: Future of Programming, AI, Ruby on Rails, Productivity and Parenting

Journalist says 4k fake AI news websites created to game Google algorithms

Cloudflare DNS Is Down

1.1.1.1 Is Down

Shopify MCP Can Be Abused to Manipulate Customer Purchases

Learning to Learn, in the Age of LLMs

Integer Division in Bucketed Time Series

Why Don't You Fucking Retire Already?

AI labs are coming for Wall Street's quants

Grad school is worse for public health than STDs (2019)

Cloudflare DNS Down in UK/EU

Tiny Great Languages: Mouse

Show HN: Forge – Connect multiple AI models through a single API

Altered State of Consciousness Feels Like an Escape from Reality

Think tank: Solar energy tops the EU electricity mix in June

Context Rot: How increasing input tokens impacts LLM performance

Comments

Show HN: Generate any workflow with natural language

Show HN: ZeroFS: The S3FS that does not suck

AskIt MCP – Apache 2.0

Show HN: A simple iOS-native map measurement app

Tresorit – secure file exchange and collaboration made easy

DEWLine Museum – The Distant Early Warning Radar Line

Launchk: Rust/Cursive TUI for looking at macOS launchd agents and daemons

Ask HN: Have you noticed AI critic content being disparaged on HN?

Researchers Develop New Tool to Measure Biological Age

I've created a tool that is saving me many hours watching YouTube

1.1.1.1 Is Down

Scientists detect light passing through human head for brain imaging

Code highlighting with Cursor AI for $500k

Microsoft Surface parody (2007) [video]

Texas AG requests Robert Roberson be executed Oct. 16

DHH: Future of Programming, AI, Ruby on Rails, Productivity and Parenting

Journalist says 4k fake AI news websites created to game Google algorithms

Cloudflare DNS Is Down

1.1.1.1 Is Down

Shopify MCP Can Be Abused to Manipulate Customer Purchases

Learning to Learn, in the Age of LLMs

Integer Division in Bucketed Time Series

Why Don't You Fucking Retire Already?

AI labs are coming for Wall Street's quants

Grad school is worse for public health than STDs (2019)

Cloudflare DNS Down in UK/EU

Tiny Great Languages: Mouse

Show HN: Forge – Connect multiple AI models through a single API

Altered State of Consciousness Feels Like an Escape from Reality

Think tank: Solar energy tops the EU electricity mix in June