frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code

https://mksg.lu/blog/context-mode
84•mksglu•8h ago

Comments

mksglu•8h ago
Author here. I shared the GitHub repo a few days ago (https://news.ycombinator.com/item?id=47148025) and got great feedback. This is the writeup explaining the architecture.

The core idea: every MCP tool call dumps raw data into your 200K context window. Context Mode spawns isolated subprocesses — only stdout enters context. No LLM calls, purely algorithmic: SQLite FTS5 with BM25 ranking and Porter stemming.

Since the last post we've seen 228 stars and some real-world usage data. The biggest surprise was how much subagent routing matters — auto-upgrading Bash subagents to general-purpose so they can use batch_execute instead of flooding context with raw output.

Source: https://github.com/mksglu/claude-context-mode Happy to answer any architecture questions.

re5i5tor•1h ago
Really intrigued and def will try, thanks for this.

In connecting the dots (and help me make sure I'm connecting them correctly), context-mode _does not address MCP context usage at all_, correct? You are instead suggesting we refactor or eliminate MCP tools, or apply concepts similar to context_mode in our MCPs where possible?

Context-mode is still very might value, even if the answer is "no," just want to make sure I understand. Also interested in your thoughts about the above.

I write a number of MCPs that work across all Claude surfaces; so the usual "CLI!" isn't as viable an answer (though with code execution it sometimes can be) ...

jamiecode•7h ago
The 98% reduction is the real story here, but the systemic problem you're solving is even bigger than individual tool calls blowing up context. When you're orchestrating multi-step workflows, each tool output becomes part of the conversation state that carries forward to the next step. A Playwright snapshot at step 1 is 56 KB. It still counts at step 3 when you've moved on to something completely different.

The subprocess isolation is smart - stdout-only is the right constraint. I've been running multi-agent workflows where the cost of tool output accumulation forces you to make bad decisions: either summarise outputs manually (defeating the purpose of tool calls), truncate logs (information loss), or cap the workflow depth. None of them good.

The search ranking piece is worth noting. Most people just grep logs or dump chunks and let the LLM sort it out. BM25 + FTS5 means you're pre-filtering at index time, not letting the model do relevance ranking on the full noise. That's the difference between usable and unusable context at scale.

Only question: how does credential passthrough work with MCP's protocol boundaries? If gh/aws/gcloud run in the subprocess, how does the auth state persist between tool calls, or does each call reinit?

mksglu•7h ago
No magic — standard Unix process inheritance. Each execute() spawns a child process via Node's child_process.spawn() with a curated env built by #buildSafeEnv (https://github.com/mksglu/claude-context-mode/blob/main/cont...). It passes through an explicit allowlist of auth vars (GH_TOKEN, AWS_ACCESS_KEY_ID, GOOGLE_APPLICATION_CREDENTIALS, KUBECONFIG, etc.) plus HOME and XDG paths so CLI tools find their config files on disk. No state persists between calls — each subprocess inherits credentials from the MCP server's environment, runs, and exits. This works because tools like gh and aws resolve auth on every invocation anyway (env vars or ~/.config files). The tradeoff is intentional: allowlist over full process.env so the sandbox doesn't leak unrelated vars.
poly2it•3h ago
Two LLMs speaking with each other on HN? Amusing!
tyre•1h ago
Why are you assuming they’re an LLM? And please don’t say “em dash”.

Note: you’re replying to the library’s author.

polski-g•28m ago
The first two sentences of the first two paragraphs of OP are a dead giveaway.
mvkel•4h ago
Excited to try this. Is this not in effect a kind of "pre-compaction," deciding ahead of time what's relevant? Are there edge cases where it is unaware of, say, a utility function that it coincidentally picks up when it just dumps everything?
nr378•4h ago
Nice work.

It strikes me there's more low hanging fruit to pluck re. context window management. Backtracking strikes me as another promising direction to avoid context bloat and compaction (i.e. when a model takes a few attempts to do the right thing, once it's done the right thing, prune the failed attempts out of the context).

jonnycoder•2h ago
It feels like the late 1990s all over again, but instead of html and sql, it’s coding agents. This time around, a lot of us are well experienced at software engineering and so we can find optimizations simply by using claude code all day long. We get an idea, we work with ai to help create a detailed design and then let it develop it for us.
elephanlemon•1h ago
Agree. I’d like more fine grained control of context and compaction. If you spend time debugging in the middle of a session, once you’ve fixed the bugs you ought to be able to remove everything related to fixing them out of context and continue as you had before you encountered them. (Right now depending on your IDE this can be quite annoying to do manually. And I’m not aware of any that allow you to snip it out if you’ve worked with the agent on other tasks afterwards.)

I think agents should manage their own context too. For example, if you’re working with a tool that dumps a lot of logged information into context, those logs should get pruned out after one or two more prompts.

Context should be thought of something that can be freely manipulated, rather than a stack that can only have things appended or removed from the end.

nr378•29m ago
Oh that's quite a nice idea - agentic context management (riffing on agentic memory management).

There's some challenges around the LLM having enough output tokens to easily specify what it wants its next input tokens to be, but "snips" should be able to be expressed concisely (i.e. the next input should include everything sent previously except the chunk that starts XXX and ends YYY). The upside is tighter context, the downside is it'll bust the prompt cache (perhaps the optimal trade-off is to batch the snips).

ip26•1h ago
Maybe the right answer is “why not both”, but subagents can also be used for that problem. That is, when something isn’t going as expected, fork a subagent to solve the problem and return with the answer.

It’s interesting to imagine a single model deciding to wipe its own memory though, and roll back in time to a past version of itself (only, with the answer to a vexing problem)

jon-wood•45m ago
I forget where now but I'm sure I read an article from one of the coding harness companies talking about how they'd done just that. Effectively it could pass a note to its past self saying "Path X doesn't work", and otherwise reset the context to any previous point.

I could see this working like some sort of undo tree, with multiple branches you can jump back and forth between.

agrippanux•1h ago
I am a happy user of this and have recommended my team also install it. It’s made a sizable reduction in my token use.
formvoltron•1h ago
this is going to crash the AI economy. nvda down 20 percent monday. lol
unxmaal•12m ago
I did this accidentally while porting Go to IRIX: https://github.com/unxmaal/mogrix/blob/main/tools/knowledge-...

Cognitive Debt: When Velocity Exceeds Comprehension

https://www.rockoder.com/beyondthecode/cognitive-debt-when-velocity-exceeds-comprehension/
262•pagade•2h ago•110 comments

Obsidian Sync now has a headless client

https://help.obsidian.md/sync/headless
129•adilmoujahid•2h ago•50 comments

Addressing Antigravity Bans and Reinstating Access

https://github.com/google-gemini/gemini-cli/discussions/20632
128•RyanShook•4h ago•99 comments

Verified Spec-Driven Development (VSDD)

https://gist.github.com/dollspace-gay/d8d3bc3ecf4188df049d7a4726bb2a00
34•todsacerdoti•1h ago•12 comments

Woxi: Wolfram Mathematica Reimplementation in Rust

https://github.com/ad-si/Woxi
149•adamnemecek•3d ago•63 comments

New evidence that Cantor plagiarized Dedekind?

https://www.quantamagazine.org/the-man-who-stole-infinity-20260225/
38•rbanffy•3d ago•26 comments

Show HN: Now I Get It – Translate scientific papers into interactive webpages

https://nowigetit.us
88•jbdamask•5h ago•68 comments

Ghosts'n Goblins – “Worse danger is ahead”

https://superchartisland.com/ghostsn-goblins/
23•elvis70•3d ago•5 comments

Werner Herzog Between Fact and Fiction

https://www.thenation.com/article/culture/werner-herzog-future-truth/
6•Hooke•1d ago•0 comments

747s and Coding Agents

https://carlkolon.com/2026/02/27/engineering-747-coding-agents/
59•cckolon•1d ago•17 comments

How Long Is the Coast of Britain? (1967)

https://www.jstor.org/stable/1721427
6•Hooke•3d ago•0 comments

We Will Not Be Divided

https://notdivided.org
2341•BloondAndDoom•17h ago•743 comments

Show HN: Tomoshibi – A writing app where your words fade by firelight

https://tomoshibi.in-hakumei.com/
5•hakumei•1h ago•0 comments

Show HN: SQLite for Rivet Actors – one database per agent, tenant, or document

https://github.com/rivet-dev/rivet
21•NathanFlurry•2h ago•5 comments

The whole thing was a scam

https://garymarcus.substack.com/p/the-whole-thing-was-scam
67•guilamu•1h ago•10 comments

OpenAI fires an employee for prediction market insider trading

https://www.wired.com/story/openai-fires-employee-insider-trading-polymarket-kalshi/
168•bookofjoe•4h ago•102 comments

CSP for Pentesters: Understanding the Fundamentals

https://www.kayssel.com/newsletter/issue-20/
5•zdw•50m ago•0 comments

From Noise to Image – interactive guide to diffusion

https://lighthousesoftware.co.uk/projects/from-noise-to-image/
39•simedw•2d ago•8 comments

The Life Cycle of Money

https://doap.metal.bohyen.space/blog/post/complete-life-cycle-of-money/
51•nanacnote•5h ago•8 comments

Unsloth Dynamic 2.0 GGUFs

https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
145•tosh•9h ago•46 comments

The Eternal Promise: A History of Attempts to Eliminate Programmers

https://www.ivanturkovic.com/2026/01/22/history-software-simplification-cobol-ai-hype/
163•dinvlad•3d ago•118 comments

A new California law says all operating systems need to have age verification

https://www.pcgamer.com/software/operating-systems/a-new-california-law-says-all-operating-system...
759•WalterSobchak•1d ago•645 comments

Seeing Like a Sedan

https://asteriskmag.com/issues/13/seeing-like-a-sedan
9•surprisetalk•3d ago•0 comments

The Future of AI

https://lucijagregov.com/2026/02/26/the-future-of-ai/
58•BerislavLopac•7h ago•57 comments

Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code

https://mksg.lu/blog/context-mode
84•mksglu•8h ago•17 comments

The United States and Israel have launched a major attack on Iran

https://www.cnn.com/2026/02/28/middleeast/israel-attack-iran-intl-hnk
753•lavp•11h ago•1788 comments

OpenAI agrees with Dept. of War to deploy models in their classified network

https://twitter.com/sama/status/2027578652477821175
1208•eoskx•15h ago•577 comments

Don't trust AI agents

https://nanoclaw.dev/blog/nanoclaw-security-model
245•gronky_•5h ago•132 comments

Latency numbers every programmer should know

https://cheat.sh/latency
53•ksec•5h ago•16 comments

Don't use passkeys for encrypting user data

https://blog.timcappalli.me/p/passkeys-prf-warning/
201•zdw•15h ago•170 comments