One of my side projects is an overnight content pipeline for my business. It pulls RSS feeds, fetches source articles, generates posts with AI, scores them, and publishes them to WordPress without supervision.
The content is a bit niche: cybersecurity incidents for Japanese manufacturing companies in Aichi Prefecture — Toyota's home region — where older workflows like fax and password-protected ZIPs still haven't fully disappeared.
The physical setup is also a little ridiculous: a SwitchBot turns the PC on at 3am, Windows Task Scheduler starts the Python pipeline, and another scheduled task shuts the machine down when it's done.
Originally this was only meant to solve my own problem. But the more failure modes I found, the more features I kept adding.
What pushed me to build qzira was cost control.
The first lesson was operational: alerts don't help at 3am. What I needed wasn't another notification, but a kill switch outside the application — something that could stop requests before they reached the provider, regardless of what the agent decided to do.
The second lesson was more embarrassing: I had miscalculated Cloudflare KV write costs by 100x. Every request was triggering a KV put. Rewriting that path to batch via cron jobs reduced writes by about 99% and fixed the unit economics.
I also became much more conservative about model choice for production content after running a simple comparison on my own pipeline.
I ran 10 articles through Claude and didn't find any hallucinations.
Then I ran 1 article through gpt-4o-mini, and it immediately inverted the meaning of the source: it wrote "operations were suspended" where the original said "no impact on operations was confirmed."
To be fair, the pipeline was tuned around Claude, so I don't take this as a general statement about model quality. It may simply have been a prompt/model fit issue. But for me, it was enough to become much more conservative about where lower-cost models are allowed to touch production content.
Both problems pointed to the same conclusion: cost and policy enforcement belong at the infrastructure layer, not inside the application.
So I built qzira — a BYOK AI gateway in front of OpenAI, Anthropic, and Google AI. It adds gateway-level budget controls, hard stops, and routing by changing the base_url in tools like Claude Code or Cursor.
Stack: Cloudflare Workers, Hono, D1, KV, and Vectorize.
There's a free tier.
Happy to answer questions about the architecture, the cost mistake, the overnight pipeline, or the slightly absurd physical setup behind it.