The stack? It’s messy. We started on Netlify because it was cheap. Hit that stupid 10-second serverless timeout after a week when I started customizing the bot. The bot would just… freeze. Had to implement a clunky polling loop that felt like a band-aid. Eventually split the agent into 3 pieces just to keep it responsive:
The Brain (Edge) figures out what you want, shoots a JSON signal to the browser. The Hands (Browser) runs the actual tool—like pulling a code reference—via a separate serverless function. The Voice (Edge) feeds the result back to talk.
It works, but it’s not elegant. It’s duct tape. Last time I coded was over 30 yrs ago. AI helped a lot, but it feels like 3 steps forward, 2 steps back when it starts obsessing about one area and wrecks what was done 5 refactors ago.
DeepSeek-R1 is the reasoning engine. We tried Gemini and a bunch of others initially. They were either too chatty or horribly bone dry. DeepSeek stays on script. Interestingly, I discovered a couple of weeks ago that a traditional architect—probably not liking me putting a dent in his ‘protected’ professional business—tried to pick a fight with the bot. Wanted to prove it couldn’t handle “professional nuance.” DeepSeek shut him down without breaking a sweat. Log is public if you want to see it.
Fallback is MiniMax M2.5. Triggers on rate limits. Sometimes it feels a bit slower, but it hasn’t failed yet.
Voice input via Web Speech API. That was a mistake. Transcription hallucinations are a nightmare. “Setback” becomes “set back” becomes “backset.” We added a dual-path validation layer. Speech-to-text gets audited before the reasoning models even see it. Still not perfect.
Liability is the real killer. In our world, if the AI hallucinates a building code clause, we’re done. Insurance won’t touch us. We’ve gone through four architectural versions. Started with 30k lines of spaghetti. Now it’s decoupled, but still fragile. We publish the crash logs publicly because hiding them feels dishonest. Also, it keeps us honest.
The hardest part wasn’t the infrastructure, the intent engineering was. Making an LLM sound like a seasoned principal when talking to a first-time homeowner, then pivot instantly to defend our biz model against an angry architect who wants to punch holes… that took 2 whole months.
Speed was killing us. So I came up with this hack—I call it ‘Eager RAG’ because it’s basically guessing what you’ll ask to see if we have something similar already in the database of controlled responses to cut response time. It burns through tokens like crazy, but man, it makes the thing feel much more instant. We also ripped out the persistent databases. Turns out 19 out of 20 visitors never come back, so why bother? Feels wasteful until you realize you’re not building for the 5% that return.
It’s not flawless. No server-side job queue yet. If a client drops mid-query, the result vanishes. But it lets me operate without junior staff. That’s the point.
The terminal is live at https://axoworks.com. Mic optional. No signup. Try to break it. Ask a complex zoning question. See if it over-promises.
Logs:
Architect vs. DeepSeek: https://logs.axoworks.com/chat-architect-vs-concierge-v147.h...
System Audit: https://logs.axoworks.com/audit-2026-02-19-v148.html
I’ll be in the comments. Thanks. Kee