frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: An agent that tunes its own cache

6•kaliades•1h ago
The weekend of last week I built chat.betterdb.com as a RAG over Valkey/Redis/Dragonfly docs. The goal was to eat our own dogfood and test publicly our caching libraries. It also saved me from having to come up with various demo/test scenarios, as I could extend the building in public to the demo.

There is a tool-result cache sitting between the SDK and tools. Each call is normalized and then checked before executing. If it hits we return from the cache, and if not, we check the semantic cache, which embeds the prompt and checks with KNN via valkey-search. If the cosine distance is close enough, we again skip the LLM and stream the cached response. In both cases, if we miss, we store the prompt embedding, actual model, input and output tokens from OpenAI's usage report, so a future hit has the dollars avoided as data.

The two tiers handle different shapes. Predefined questions, copy-pasted questions, checking the same thing again after time - produces byte-identical strings the tool cache catches. Human paraphrase is what the semantic tier exists for.

This Wednesday was a bank holiday where I live, so I used to extend it further - the libraries the chat relies on now store metadata in the Valkey (or Redis if that's your preference) instance, then our monitoring reads and analyze that data and suggests improvements. These are exported also through our MCP server, so the chat's agent can check and create suggestions as well, and since this is just a demo, it can also approve its suggestions (do not do this on real production environment, unless you are a true LLM believer). The libs also read the config from the Valkey instance, so there is no restart needed. I hooked it on cron inside Vercel and let it run over the night and next day.

Between Run 1 and Run 3, it started making less tool calls. The first run it suggested several different TTL changes and applied them. Run 2 and 1 had similar suggestions, because the TTL is the wrong point of control - they take natural language input (`How fast is XADD?` vs `XADD performance` are two different strings, that "mean" the same thing) so the tool cache doesn't fire and are covered by the semantic cache. An actual fix would be to move these tools from the exact-match into the semantic cache checks - a code change, not a config change. It was an indicator of a problem the system can't fix on its own. In the future the routing might also become configurable to solve this without redeploying and test and verify in quicker loops. Run 3 just didn't propose anything new - 15 -> 13 -> 8 tool calls across the three runs.

Curious how others running similar loops decide what the agent can touch. Am I too skeptical of hallucinations and overly cautious?

The chat can be found at https://chat.betterdb.com (it has links to all of the repos in it) And a more detailed write up can be found at https://www.betterdb.com/blog/cache-that-tunes-itself

Google Cloud Fraud Defence is just WEI repackaged

https://privatecaptcha.com/blog/google-cloud-fraud-defence-wei/
1•ribtoks•16s ago•0 comments

Junkyard Computing: Repurposing Discarded Smartphones to Minimize Carbon

https://arxiv.org/abs/2110.06870
1•PaulHoule•1m ago•0 comments

New 40k Acre NSA AI Datacenter: Stratos Project Fact Sheet

https://www.boxeldercountyut.gov/647/Stratos-Project-Fact-Sheet
1•djha-skin•6m ago•1 comments

Are we ready for a prompt-in-public system?

https://github.com/andrewarrow/prompt-in-public/blob/main/README.md
1•fcpguru•8m ago•0 comments

Huawei braces for $12B in AI chip revenue-Chinese fabs can barely keep up

https://www.tomshardware.com/tech-industry/huawei-expects-12-billion-in-ai-chip-revenue-this-year...
2•sleepyguy•8m ago•1 comments

Yes, I set up Karpathy's LLM wiki. Now what?

https://twitter.com/keane42443/status/2052426761477255448
1•10keane•9m ago•0 comments

Show HN: I made a tool to search your video footage and it's on GitHub

1•iliashad•10m ago•0 comments

Did anyone expect Win32 to still be going strong in 2026?

https://xcancel.com/docsmsft/status/2052089975802368301
1•HelloUsername•11m ago•0 comments

Show HN: DefinitelyTyped Search – Raycast extension for TypeScript types

https://www.raycast.com/tcelestino/definitelytyped
1•zigzeira•11m ago•0 comments

Ask HN: What are your strategies for reviewing AI generated code?

1•iryna_kondr•12m ago•0 comments

Just Fucking Use Go

https://blainsmith.com/articles/just-fucking-use-go/
27•xngbuilds•16m ago•4 comments

Surprising benefits of MCP-only analytics

https://offstage.sh/benefits-of-mcp-only
2•jamie_davenport•16m ago•0 comments

Building a web server in aarch64 assembly to give my life (a lack of) meaning

https://imtomt.github.io/ymawky/
1•theanonymousone•17m ago•0 comments

The post-DMA App Store fee stack and the case for Web-to-App billing

https://foresightmobile.com/blog/app-store-fees-2026-web-to-app-billing
1•gazreese•17m ago•0 comments

Meta will use AI to analyze height and bone structure to identify underage users

https://techcrunch.com/2026/05/05/meta-will-use-ai-to-analyze-height-and-bone-structure-to-identi...
1•gmays•17m ago•0 comments

Addressing Some AI Fake News [video]

https://www.youtube.com/watch?v=vdPP7zLCgrk
2•salutis•23m ago•1 comments

ExpLab: A free and open-source warehouse native experimentation analysis tool

https://github.com/anasfrh/explab
1•ahW8iyz•25m ago•1 comments

You computer guys. You build something you can't control

2•chenzhendong•25m ago•0 comments

Is GraphQL the Panacea for Agentic AI?

https://magiroux.com/posts/is-graphql-the-panacea-for-agentic-ai
2•xuorig_•26m ago•0 comments

David Attenborough celebrating his 100th birthday today

https://www.nbcnews.com/world/united-kingdom/david-attenborough-excited-hushed-voice-nature-progr...
1•HarHarVeryFunny•26m ago•1 comments

End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

https://arxiv.org/abs/2605.00503
1•gmays•27m ago•0 comments

OpenQASM 3: A broader and deeper quantum assembly language

https://arxiv.org/abs/2104.14722
1•modinfo•31m ago•0 comments

NARE CLI (github.com/nare-labs)

https://cli.narelabs.com/
2•BastOfMax•32m ago•0 comments

Podman rootless containers and the Copy Fail exploit

https://garrido.io/notes/podman-rootless-containers-copy-fail/
4•ggpsv•33m ago•0 comments

Three Model Organisms for Taste

https://www.astralcodexten.com/p/three-model-organisms-for-taste
2•Ariarule•34m ago•0 comments

PipeDream on the Acorn Archimedes

https://stonetools.ghost.io/pipedream-archimedes/
1•TMWNN•35m ago•0 comments

Ask HN: Is anyone interested in engineering focused coding agent course?

2•onder_ceylan•35m ago•0 comments

They found more bad vulns in cPanel

https://old.reddit.com/r/cpanel/comments/1t6wf5n/cpanel_whm_security_update_cve202629201/
1•taspeotis•36m ago•0 comments

Presidential Unsealing and Reporting System for UAP Encounters

https://war.gov/UFO
3•yawboakye•36m ago•2 comments

Show HN: Clipd – A better clipboard manager for Windows 11, written in Rust

https://github.com/Brumbelow/clipd
1•brumbelow•37m ago•0 comments