frontpage.

Show HN: We cut >60% of tokens from agentic tasks by removing repeated context

https://parcle.ai/

1•longtermop•13h ago

Every agentic system I see has the same hidden tax: the model keeps rereading the same context.

Tickets, Slack threads, docs, customer history, database notes, runbooks, logs, prior decisions. You can cache static prefixes, route to cheaper models, or set team budgets, but none of those fixes the underlying behavior: agents start most tasks trying to re-explore everything.

We built Parcle as a shared memory layer for AI agents. It ingests operational context, indexes what happened, and lets agents retrieve a small, relevant memory set for the next step instead of pasting everything back into the prompt - or worse, letting the agent go explore on it's own and burning tokens.

We started tracking our tokens consumed on tasks with and without our memory layer just with indexing of local files. In our deployments/evals, the biggest reduction we’ve seen is up to 70% lower token spend on agentic tasks, with roughly 2x faster task completion. The median was ~30% less tokens spent. The biggest savings often come from data and context-heavy workflows; when the agent needs to retrieve data and context from multiple locations and sources. The best cases so far are support, ops, research, sales, and finance workflows where the agent otherwise reloads the same account/workflow/history context again and again.

Why I think this matters now:

Pylon’s AI cost post made us ask the question:

How much are companies paying because their agents keep looking for the same context? Is this a hidden tax that memory could solve?

We built Parcle to make agents remember. The surprise was that memory does not just make agents more useful. It also cuts down on tokens consumed. Less tokens spent figuring where things are, and more time spent doing actually productive work.

- Anthropic says agents use about 4x more tokens than chat. We think this is an understatement, - OpenAI and Anthropic both have prompt caching because repeated prompt context is expensive, but caching mostly helps when the reusable content is stable enough to hit the cache. But this doesn't resolve the fact that prompt caching is forfeited after 5min-15mins of inactivity. - “Lost in the Middle” and Chroma’s “context rot” work both point at the same issue: more context is not the same thing as usable memory. - The context-engineering crowd seems to be converging on this: the hard part is deciding what the model should see at each step.

Parcle is our attempt at making that operational: memory outside the model, selected into context only when useful.

I’d love feedback from people running real agents in production:

1. Where are your tokens actually going: repeated input context, tool traces, retries, output, evals, or something else? 2. Have prompt caching and model routing been enough? 3. What would you need to trust an external memory layer inside an agent loop?

A PostgreSQL Database for Every Agent: In-Database RAG, Graph, and Multitenancy

Death by 1k Compromises: How to Tap into Founder Mode

Where do migrants live, and where were they born?

Making budget models punch above their weight with a smart Rust harness

Show HN: A small, crazy fast hybrid search engine written in Rust

Why AI Is Incorrigibly Didactic

Third time's the charm for a row of faint galaxies without dark matter

Wiki Spy

Pnpm temp paths broke lifecycle sockets

Developers React to AI-Scented Blog Posts

Show HN: SDK for embedding zot coding agent in your Node.js applications

Commodore Unveils Linux Powered Flip Phone

Apple's A12 and A13 Chips Facing New Unpatchable Exploit

Is it time for a new Embedded Linux build system?

Europe Scales Down AI Ambitions with Smaller Data Center Tender

Show HN: Foglamp – Open-source o11y for AI SDK

The founder of Craigslist has given away half a billion dollars

Project Fetch: Phase Two

Tech CEOs are breaking the law

Show HN: Kelora – query and transform logs from the command line

Hotshot – tool to make screenshot –> Claude/copilot/goose faster

ClojureWasm is a Clojure runtime written from scratch in Zig and Clojure, no JVM

Brexit tore apart European science – now the research rifts are healing

Leaked Names Expose Billionaire Peter Thiel's 'Dialog' Society

Show HN: Tamper-evident audit logs for LangChain/Crew AI agents

Infrastructure Is the Source of Truth

AI data centre would be 'one of Scotland's top polluters' if plans greenlit

Google just lost one of its biggest AI names to OpenAI

Openfoot Manager

Show HN: electron-expose – Make Electron IPC boring with TypeScript decorators