frontpage.

Show HN: An agent that tunes its own cache

6•kaliades•1h ago

The weekend of last week I built chat.betterdb.com as a RAG over Valkey/Redis/Dragonfly docs. The goal was to eat our own dogfood and test publicly our caching libraries. It also saved me from having to come up with various demo/test scenarios, as I could extend the building in public to the demo.

There is a tool-result cache sitting between the SDK and tools. Each call is normalized and then checked before executing. If it hits we return from the cache, and if not, we check the semantic cache, which embeds the prompt and checks with KNN via valkey-search. If the cosine distance is close enough, we again skip the LLM and stream the cached response. In both cases, if we miss, we store the prompt embedding, actual model, input and output tokens from OpenAI's usage report, so a future hit has the dollars avoided as data.

The two tiers handle different shapes. Predefined questions, copy-pasted questions, checking the same thing again after time - produces byte-identical strings the tool cache catches. Human paraphrase is what the semantic tier exists for.

This Wednesday was a bank holiday where I live, so I used to extend it further - the libraries the chat relies on now store metadata in the Valkey (or Redis if that's your preference) instance, then our monitoring reads and analyze that data and suggests improvements. These are exported also through our MCP server, so the chat's agent can check and create suggestions as well, and since this is just a demo, it can also approve its suggestions (do not do this on real production environment, unless you are a true LLM believer). The libs also read the config from the Valkey instance, so there is no restart needed. I hooked it on cron inside Vercel and let it run over the night and next day.

Between Run 1 and Run 3, it started making less tool calls. The first run it suggested several different TTL changes and applied them. Run 2 and 1 had similar suggestions, because the TTL is the wrong point of control - they take natural language input (`How fast is XADD?` vs `XADD performance` are two different strings, that "mean" the same thing) so the tool cache doesn't fire and are covered by the semantic cache. An actual fix would be to move these tools from the exact-match into the semantic cache checks - a code change, not a config change. It was an indicator of a problem the system can't fix on its own. In the future the routing might also become configurable to solve this without redeploying and test and verify in quicker loops. Run 3 just didn't propose anything new - 15 -> 13 -> 8 tool calls across the three runs.

Curious how others running similar loops decide what the agent can touch. Am I too skeptical of hallucinations and overly cautious?

The chat can be found at https://chat.betterdb.com (it has links to all of the repos in it) And a more detailed write up can be found at https://www.betterdb.com/blog/cache-that-tunes-itself

Google Cloud Fraud Defence is just WEI repackaged

Junkyard Computing: Repurposing Discarded Smartphones to Minimize Carbon

New 40k Acre NSA AI Datacenter: Stratos Project Fact Sheet

Are we ready for a prompt-in-public system?

Huawei braces for $12B in AI chip revenue-Chinese fabs can barely keep up

Yes, I set up Karpathy's LLM wiki. Now what?

Show HN: I made a tool to search your video footage and it's on GitHub

Did anyone expect Win32 to still be going strong in 2026?

Show HN: DefinitelyTyped Search – Raycast extension for TypeScript types

Ask HN: What are your strategies for reviewing AI generated code?

Just Fucking Use Go

Surprising benefits of MCP-only analytics

Building a web server in aarch64 assembly to give my life (a lack of) meaning

The post-DMA App Store fee stack and the case for Web-to-App billing

Meta will use AI to analyze height and bone structure to identify underage users

Addressing Some AI Fake News [video]

ExpLab: A free and open-source warehouse native experimentation analysis tool

You computer guys. You build something you can't control

Is GraphQL the Panacea for Agentic AI?

David Attenborough celebrating his 100th birthday today

End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

OpenQASM 3: A broader and deeper quantum assembly language

NARE CLI (github.com/nare-labs)

Podman rootless containers and the Copy Fail exploit

Three Model Organisms for Taste

PipeDream on the Acorn Archimedes

Ask HN: Is anyone interested in engineering focused coding agent course?

They found more bad vulns in cPanel

Presidential Unsealing and Reporting System for UAP Encounters

Show HN: Clipd – A better clipboard manager for Windows 11, written in Rust