Show HN: NadirClaw, LLM router that cuts costs by routing prompts right

https://github.com/doramirdor/NadirClaw

1•amirdor•1h ago

I use Claude and Codex heavily for coding, and I kept burning through my quota halfway through the week. When I looked at my logs, most of my prompts were things like "summarize this," "reformat this JSON," or "write a docstring." Stuff that any small model handles fine.

So I built NadirClaw. It's a Python proxy that sits between your app and your LLM providers. It classifies each prompt in about 10ms and routes simple ones to Gemini Flash, Ollama, or whatever cheap/local model you want. Only the complex prompts hit your premium API.

It's OpenAI-compatible, so you just point your existing tools at it. Works with OpenClaw, Cursor, Claude Code, or anything that talks to the OpenAI API.

In practice I went from burning through my Claude quota in 2 days to having it last the full week. Costs dropped around 60%.

curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/... | sh

Still early. The classifier is simple (token count + pattern matching + optional embeddings), and I'm sure there are edge cases I'm missing. Curious what breaks first, and whether the routing logic makes sense to others.

Repo: https://github.com/doramirdor/NadirClaw

Comments

amirdor•1h ago

I use Claude and Codex heavily for coding, and I kept burning through my quota halfway through the week. When I looked at my logs, most of my prompts were things like "summarize this" or "reformat this JSON." Stuff any small model handles fine.

NadirClaw is a Python proxy that classifies each prompt in ~10ms and routes simple ones to Gemini Flash, Ollama, or whatever cheap model you want. Only complex prompts hit your premium API. It's OpenAI-compatible, so you just point your existing tools at it.

In practice I went from burning through my Claude quota in 2 days to having it last the full week. Costs dropped around 60%.

pip install nadirclaw

Still early. The classifier is simple (token count + pattern matching + optional embeddings). Curious what breaks first.

US threatens to quit International Energy Agency if won't drop green transition

Show HN: Masharif

Design docs are waterfall wearing a hoodie

Show HN: GreedyPhrase – 1.21x better compression than GPT-4o tiktoken, 6x faster

Phison CEO: Consumer electronics firms may fail by 2026 over AI memory crisis

Show HN: Spawn – Postgres migration/test build system with minijinja (not vibed)

Practical Guide to Building Reliable AI Agents

The anxiety driving AI's brutal work culture is a warning for all of us

Did Gemini just give me someone's personal information?

Show HN: Instagram Saved Collection Exporter

Join the Python Security Response Team

Convert Audi to 432Hz

The Final Bottleneck

Frederick Wiseman, 96, Penetrating Documentarian of Institutions, Dies

Safe VSP

Tesla Robotaxis Reportedly Crashing at a Rate That's 4x Higher Than Humans

Open-source game engine Godot is drowning in 'AI slop' code contributions

Why an A.I. Video of Tom Cruise Battling Brad Pitt Spooked Hollywood

Ask HN: How do you overcome imposter syndrome?

The most practical, fast, tiny command sandboxing for AI agents

An assembler that compiles to a printf loop

The mathematical mystery inside the shooter Quake 3

Adam Mastroianni of Experimental History Interviews Gwern (2025)

First Agent Skills Hackathon by the Authors of SkillsBench

Rathbun's Operator

How Jet Engines Are Powering Data Centers

PostCSS creator: How to make your open source project popular

The gut microbiota shapes the human and murine breath volatilome

Show HN: Algorithms 1.0.0 – Minimal and clean implementations of algorithms

The Cost of Staying vs Judgement, Surface Area and Compute