WarpGrep – RL Subagent for Fast Context (Like SWE-Grep)

1•bhaktatejas922•1mo ago

Comments

bhaktatejas922•1mo ago

Hello HN,

We’re the team behind WarpGrep. It’s a FAST context retrieval subagent designed to fix coding agents spending ~60% of their time searching for context + the huge context rot problem.

We built this because we found that standard RAG or naive context stuffing leads to "context rot"—where irrelevant files poison the model’s reasoning on long-horizon tasks. Inspired by Cognition’s SWE-Grep, we wanted to build an accessible version that integrates via MCP (Model Context Protocol) or SDK.

How it works: Instead of a single prompt trying to do everything, WarpGrep treats context retrieval as a distinct, RL-trained system. We reward correct context retrieval and penalize irrelevant lines.

Constraints: It operates on a strict budget of 4 turns. Parallelism: It executes up to 8 parallel tool calls per turn (grep, list, read, etc.).

Inference: We worked with NVIDIA to optimize this on B200s. We are hitting ~900 tokens/sec (compared to SWE-Grep’s ~650 t/s). The heavy prefill optimization was critical here because grep operations are read-heavy.

The Results: In our internal benchmarks, offloading retrieval to this subagent speeds up tasks by 40% and reduces token usage by roughly the same amount. More importantly, it seems to reduce "context rot" by ~70% on longer tasks because the agent isn't distracted by irrelevant file headers. On SWE-Bench Pro we see 5-12% improvement on long horizon tasks and stable chats for 2-3x more user messages.

It works with every coding agent - Claude Code, Codex, and OpenCode. We’re curious to see how it handles your edge cases (especially huge repos).

There is a free tier, but if you want to push it hard, you can use the code BF16 for 40M tokens of credit to test the API limits. We do recommend adding a payment method to get around the rate limits but you won't be charged until December 14th. At which point it will still be almost 10x cheaper than Claude Haiku.

Happy to answer questions about the CUDA optimizations or the RL training process!

ClawEmail: 1min setup for OpenClaw agents with Gmail, Docs

UnAutomating the Economy: More Labor but at What Cost?

Show HN: Gettorr – Stream magnet links in the browser via WebRTC (no install)

Statin drugs safer than previously thought

Handy when you just want to distract yourself for a moment

More States Are Taking Aim at a Controversial Early Reading Method

AI will not save developer productivity

How I do and don't use agents

BTDUex Safe? The Back End Withdrawal Anomalies

Show HN: Compile-Time Vibe Coding

Show HN: Ensemble – macOS App to Manage Claude Code Skills, MCPs, and Claude.md

PR to support XMPP channels in OpenClaw

Twenty: A Modern Alternative to Salesforce

Raspberry Pi: More memory-driven price rises

Level Up Your Gaming

Di.day is a movement to encourage people to ditch Big Tech

Show HN: AI generated personal affirmations playing when your phone is locked

Show HN: GTM MCP Server- Let AI Manage Your Google Tag Manager Containers

Launch of X (Twitter) API Pay-per-Use Pricing

Facebook seemingly randomly bans tons of users

Global Bird Count Event

What Is Ruliology?

Jon Stewart – One of My Favorite People – What Now? with Trevor Noah Podcast [video]

P2P crypto exchange development company

Vocal Guide – belt sing without killing yourself

Write for Your Readers Even If They Are Agents

Knowledge-Creating LLMs

Maple Mono: Smooth your coding flow

Sid Meier's System for Real-Time Music Composition and Synthesis

Show HN: Slop News – HN front page now, but it's all slop