frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Context Gateway – Compress agent context before it hits the LLM

https://github.com/Compresr-ai/Context-Gateway
29•ivzak•2h ago
We built an open-source proxy that sits between coding agents (Claude Code, OpenClaw, etc.) and the LLM, compressing tool outputs before they enter the context window.

Demo: https://www.youtube.com/watch?v=-vFZ6MPrwjw#t=9s.

Motivation: Agents are terrible at managing context. A single file read or grep can dump thousands of tokens into the window, most of it noise. This isn't just expensive — it actively degrades quality. Long-context benchmarks consistently show steep accuracy drops as context grows (OpenAI's GPT-5.4 eval goes from 97.2% at 32k to 36.6% at 1M https://openai.com/index/introducing-gpt-5-4/).

Our solution uses small language models (SLMs): we look at model internals and train classifiers to detect which parts of the context carry the most signal. When a tool returns output, we compress it conditioned on the intent of the tool call—so if the agent called grep looking for error handling patterns, the SLM keeps the relevant matches and strips the rest.

If the model later needs something we removed, it calls expand() to fetch the original output. We also do background compaction at 85% window capacity and lazy-load tool descriptions so the model only sees tools relevant to the current step.

The proxy also gives you spending caps, a dashboard for tracking running and past sessions, and Slack pings when an agent is sitting there waiting on you.

Repo is here: https://github.com/Compresr-ai/Context-Gateway. You can try it with:

  curl -fsSL https://compresr.ai/api/install | sh
Happy to go deep on any of it: the compression model, how the lazy tool loading works, or anything else about the gateway. Try it out and let us know how you like it!

Comments

verdverm•2h ago
I don't want some other tooling messing with my context. It's too important to leave to something that needs to optimize across many users, there by not being the best for my specifics.

The framework I use (ADK) already handles this, very low hanging fruit that should be a part of any framework, not something external. In ADK, this is a boolean you can turn on per tool or subagent, you can even decide turn by turn or based on any context you see fit by supplying a function.

YC over indexed on AI startups too early, not realizing how trivial these startup "products" are, more of a line item in the feature list of a mature agent framework.

I've also seen dozens of this same project submitted by the claws the led to our new rule addition this week. If your project can be vibe coded by dozens of people in mere hours...

thesiti92•2h ago
do you guys have any stats on how much faster this is than claude or codex's compression? claudes is super super slow, but codex feels like an acceptable amount of time? looks cool tho, ill have to try it out and see if it messes with outputs or not.
jameschaearley•1h ago

  The intent-conditioned compression is the interesting part here. Most context management I've seen is either naive truncation or generic summarization that doesn't account for why the tool was called. Training classifiers on model internals to
  figure out which tokens carry signal for a given task -- that's doing something different from what frameworks offer out of the box.

  I poked around the repo and didn't see any evals measuring compression quality. You cite the GPT-5.4 long-context accuracy drop as motivation, which makes sense -- but the natural follow-up is: does your compression actually recover that accuracy?
  Something like SWE-bench pass rates with and without the gateway at various context lengths would go a long way. Without that, it's hard to tell if the SLM is making good decisions or just making the context shorter.

  A few other things I'm curious about:

  • How does the SLM handle ambiguous tool calls? E.g., a broad grep where the agent isn't sure what it's looking for yet -- does the compressor tend to be too aggressive in those cases?
  • What's the latency overhead per tool call? If the SLM inference adds even 200-300ms per compression step, that compounds fast in agentic loops with dozens of tool calls.
  • How often does expand() get triggered in practice? If the agent frequently needs to recover stripped content, that's a signal the compression is too lossy.
metadat•1h ago
Don't post generated/AI-edited comments. HN is for conversation between humans https://news.ycombinator.com/item?id=47340079 - 1 day ago, 1700 comments
PufPufPuf•1h ago
That comment reads pretty normal to me, and it raises valid points
altruios•1h ago
Regardless, these appear to be valid/sound questions, with answers to which I am interested.
linkregister•4m ago
How do you know this comment is created using generative AI?
uaghazade•1h ago
ok, its great
esafak•1h ago
I can already prevent context pollution with subagents. How is this better?
root_axis•50m ago
Funny enough, Anthropic just went GA with 1m context claude that has supposedly solved the lost-in-the-middle problem.
SyneRyder•27m ago
Just for anyone else who hadn't seen the announcement yet, this Anthropic 1M context is now the same price as the previous 256K context - not the beta where Anthropic charged extra for the 1M window:

https://x.com/claudeai/status/2032509548297343196

As for retrieval, the post shows Opus 4.6 at 78.3% needle retrieval success in 1M window (compared with 91.9% in 256K), and Sonnet 4.6 at 65.1% needle retrieval in 1M (compared with 90.6% in 256K).

siva7•20m ago
now that's major news
BloondAndDoom•11m ago
In addition to context rot, cost matters, I think lots of people use toke compression tools for that not because of context rot
kuboble•44m ago
I wonder what is the business model.

It seems like the tool to solve the problem that won't last longer than couple of months and is something that e.g. claude code can and probably will tackle themselves soon.

tontinton•38m ago
Is it similar to rtk? Where the output of tool calls is compressed? Or does it actively compress your history once in a while?

If it's the latter, then users will pay for the entire history of tokens since the change uncached: https://platform.claude.com/docs/en/build-with-claude/prompt...

How is this better?

BloondAndDoom•12m ago
This is a bit more akin to distill - https://github.com/samuelfaj/distill

Advantage of SML in between some outputs cannot be compressed without losing context, so a small model does that job. It works but most of these solutions still have some tradeoff in real world applications.

lambdaone•27m ago
This company sounds like it has months to live, or until the VC money runs out at most. If this idea is good, Anthropic et. al. will roll it into their own product, eliminating any purpose for it to exist as an independent product. And if it isn't any good, the company won't get traction.
sethcronin•3m ago
I guess I'm skeptical that this actually improves performance. I'm worried that the middle man, the tool outputs, can strip useful context that the agent actually needs to diagnose.

Show HN: Channel Surfer – Watch YouTube like it’s cable TV

https://channelsurfer.tv
207•kilroy123•2d ago•91 comments

Can I run AI locally?

https://www.canirun.ai/
504•ricardbejarano•7h ago•138 comments

Hammerspoon

https://github.com/Hammerspoon/hammerspoon
57•tosh•1h ago•23 comments

Stanford researchers report first recording of a blue whale's heart rate (2019)

https://news.stanford.edu/stories/2019/11/first-ever-recording-blue-whales-heart-rate
9•eatonphil•56m ago•1 comments

Show HN: Context Gateway – Compress agent context before it hits the LLM

https://github.com/Compresr-ai/Context-Gateway
30•ivzak•2h ago•19 comments

Qatar helium shutdown puts chip supply chain on a two-week clock

https://www.tomshardware.com/tech-industry/qatar-helium-shutdown-puts-chip-supply-chain-on-a-two-...
199•johnbarron•7h ago•184 comments

TUI Studio – visual terminal UI design tool

https://tui.studio/
451•mipselaer•9h ago•254 comments

Parallels confirms MacBook Neo can run Windows in a virtual machine

https://www.macrumors.com/2026/03/13/macbook-neo-runs-windows-11-vm/
90•tosh•6h ago•108 comments

Your phone is an entire computer

https://medhir.com/blog/your-phone-is-an-entire-computer
132•medhir•2h ago•126 comments

Elon Musk pushes out more xAI founders as AI coding effort falters

https://www.ft.com/content/e5fbc6c2-d5a6-4b97-a105-6a96ea849de5
73•merksittich•3h ago•55 comments

Launch HN: Captain (YC W26) – Automated RAG for Files

https://www.runcaptain.com/
36•CMLewis•4h ago•13 comments

Mouser: An open source alternative to Logi-Plus mouse software

https://github.com/TomBadash/MouseControl
12•avionics-guy•1h ago•3 comments

The Wyden Siren Goes Off Again: We'll Be "Stunned" by NSA Under Section 702

https://www.techdirt.com/2026/03/12/the-wyden-siren-goes-off-again-well-be-stunned-by-what-the-ns...
221•cf100clunk•3h ago•77 comments

Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas

https://www.getspine.ai/
70•a24venka•6h ago•59 comments

The wild six weeks for NanoClaw's creator that led to a deal with Docker

https://techcrunch.com/2026/03/13/the-wild-six-weeks-for-nanoclaws-creator-that-led-to-a-deal-wit...
24•wateroo•36m ago•0 comments

John Carmack about open source and anti-AI activists

https://twitter.com/id_aa_carmack/status/2032460578669691171
107•tzury•2h ago•151 comments

Coding after coders: The end of computer programming as we know it

https://www.nytimes.com/2026/03/12/magazine/ai-coding-programming-jobs-claude-chatgpt.html?smid=u...
53•angst•1d ago•21 comments

Bucketsquatting is (finally) dead

https://onecloudplease.com/blog/bucketsquatting-is-finally-dead
278•boyter•11h ago•149 comments

Lost Doctor Who Episodes Found

https://www.bbc.co.uk/news/articles/c4g7kwq1k11o
147•edent•14h ago•43 comments

Using Thunderbird for RSS

https://rubenerd.com/using-thunderbird-for-rss/
13•ingve•3d ago•0 comments

The Accidental Room (2018)

https://99percentinvisible.org/episode/the-accidental-room/
16•blewboarwastake•2h ago•1 comments

Meta Platforms: Lobbying, dark money, and the App Store Accountability Act

https://github.com/upper-up/meta-lobbying-and-other-findings
1081•shaicoleman•9h ago•464 comments

Okmain: How to pick an OK main colour of an image

https://dgroshev.com/blog/okmain/
203•dgroshev•4d ago•40 comments

How do you capture WHY engineering decisions were made, not just what?

7•zain__t•23m ago•6 comments

Militaries are scrambling to create their own Starlink

https://www.newscientist.com/article/2517766-why-the-worlds-militaries-are-scrambling-to-create-t...
43•mooreds•2h ago•67 comments

E2E encrypted messaging on Instagram will no longer be supported after 8 May

https://help.instagram.com/491565145294150
313•mindracer•7h ago•166 comments

Gvisor on Raspbian

https://nubificus.co.uk/blog/gvisor-rpi5/
55•_ananos_•10h ago•11 comments

Digg is gone again

https://digg.com/
21•hammerbrostime•1h ago•8 comments

The Mrs Fractal: Mirror, Rotate, Scale (2025)

https://www.4rknova.com//blog/2025/06/22/mrs-fractal
36•ibobev•4d ago•3 comments

Removing recursion via explicit callstack simulation

https://jnkr.tech/blog/removing-recursion
17•todsacerdoti•4d ago•2 comments