frontpage.

I built this to solve my own problem — paying for GPT-4/Claude on prompts that Gemini Flash handles fine.

InferShrink wraps your existing OpenAI/Anthropic/Google client in 3 lines. It classifies prompt complexity and routes to the cheapest model that can handle it. Same provider, no surprise switches.

The pipeline: classify → compress (LLMLingua, optional) → retrieve (FAISS, optional) → route → track. When all stages combine, 10x+ cost reduction on mixed workloads.

Key design decisions:

• Same-provider routing only. If you use OpenAI, it stays on OpenAI. No cross-provider surprises. • Sub-millisecond classification overhead • Optional FAISS retrieval + LLMLingua compression for RAG pipelines • 539 tests, Semgrep + Trivy scanned

pip install infershrink

Blog post with the reasoning: https://musashimiyamoto1-cloud.github.io/infershrink-site/bl...

Happy to answer questions about the routing heuristics or compression tradeoffs.

Show HN: First native zeroclaw build on Android/Termux (aarch64, no proot)

Electrical control of magnetism in 2D materials promises to advance spintronics

A Curious Trig Identity

Trig of Inverse Trig

Python Type Checker Comparison: Empty Container Inference

Jjq, a local merge queue for jj

Tech legend Stewart Brand on Musk, Bezos and his extraordinary life

The new Design for Stack Overflow is now live [beta]

Show HN: I scanned 35 SaaS products across ChatGPT, Claude, Perplexity, Gemini

Illusion and Well-Being: A Perspective on Mental Health [pdf]

UAA – A spec for AI and Human coding collaboration

AI-Driven Smart Contract Vulnerability Research

Show HN: Me.txt – A personal identity file for AI agents

Kyoto University develops AI monk robot equipped with Buddhist scriptures

OpenPencil: Open-source vector design tool controlled by AI Agents

Why backups aren't a recovery strategy (and what is)

Show HN: ContextVM – Running MCP over Nostr

Silicon, not oil: Why the U.S. needs the Gulf for AI

Show HN: A live Python REPL with an agentic LLM that edits and evaluates code

Vital Cat Update: An Update to the Update

Show HN: Echos – Self-hosted AI knowledge base for things you forget

Show HN: PageDuel – A/B testing from $9/mo, no code required

Valve is a bigger threat to PlayStation than Xbox ever was

Never Buy A .online Domain

Show HN: ZoneMapzone World Clock

Cabbage Genetics

The two kinds of desire, and one of the most important things I know

Show HN: Browse2API – Turn any website into an API

YouSky: My one-person social network (Short version)

Exploring a Future of Programming

Show HN: InferShrink – Cut LLM API costs 10x with automatic model routing