frontpage.

Every tool-calling LLM request resends the full tool schemas through prefill. With 50 tools that's ~6,000 tokens reprocessed on every request, for every user, even though the tools never change.

ContextCache compiles tool schemas into a KV cache once and reuses it across all requests. Only the user query goes through prefill.

Results (Qwen3-8B, RTX 3090 Ti): - 50 tools: 5,625ms → 193ms (29.2x speedup) - Zero quality degradation (TSA 0.850 matches full prefill exactly)

Also includes a CPU-only orchestrator (no GPU needed) using llama.cpp + Qwen3.5-2B that routes queries to the right tool in ~550ms. Works with any LLM backend — Ollama, Claude, OpenAI, xAI, DeepSeek, Groq, or self-hosted.

Two products from one project: - Route-only (~500ms): just tool detection, no LLM needed - Full pipeline (~3s): route → extract params → execute → synthesize

Open source (CC BY 4.0), paper included.

Japanese man arrested for staining a temple in 2015

So long, and thanks for all the logs

Show HN: AI Town – Your Claude conversation history as a living pixel city

Autonomous Weapon Systems and International Humanitarian Law

Linux Mint: Monthly News – February 2026

Why Understanding AI Internals Won't Explain Agent Failures

Real-time deepfake detection API and X integration demo

The Power Brokers Behind the $250B Influencer Economy

CPU scam: Chuwi CoreBook X uses AMD Ryzen 5 5500U instead of 7430U

10% of Firefox crashes are caused by bitflips

Ask HN: How do you find contracting/freelance roles without recruiters nowadays?

Pentagon Eyes New 'Robot Ship' Concept for Low-Profile, All-Domain Logistics

ChatRoutes is open source now

Agent's context is a junk drawer

Show HN: OpenTimelineEngine – Shared local memory for Claude Code and codex

I'm building a $15/mo status page would you pay for it?

The Purpose of Keyboard Bumps – Its Not What You Think

Enterprise UI Module Federation

Show HN: We want to kill SaaS glue code with one shared infrastructure model

Show HN: Tyop: A macOS menu bar app that fixes typos on demand

Show HN: safe-docx lets coding agents edit Word docs without breaking formatting

Show HN: I built a language app that generates songs from your vocab list

Show HN: A zero-dependency multi-agent AI that negotiates instead of agreeing

Father claims Google's AI product fuelled son's delusional spiral

The origin of our fascination with crystals

Treetops Emit Ultraviolet Sparkles During Thunderstorms

Show HN: MomentSurfer – AI Scrolling Agent for Social Media

Don't Let Crypto Kill the Economy

Show HN: SmartAgentKit – policy-governed smart wallets for AI agents

Show HN: Karellen-rr-MCP – MCP server that gives LLMs rr reverse debugging

Show HN: ContextCache – Cache tool schema KV states, skip 99% of prefill tokens