From what we’ve seen (and experienced ourselves), it’s relatively easy to get an agent prototype working with tools like LangChain, AutoGen, or CrewAI, but much harder to move that into something reliable and trustworthy enough for real use.
Some of the issues we’ve felt:
-Agents making different decisions from the same input
-Opaque reasoning that’s hard to debug or trust
-Tool use that works in demos but fails under edge cases
-Hallucinated or incomplete decisions that don’t stand up in production
-Limited ability to gather missing info before acting
It’s got us thinking: if an agent could collate data, then call a tool (our system) with a bespoke symbolic model (that you created) that could reason, ask follow-up questions (for an AI agent or human to answer) and provides results that are deterministic, explainable, and repeatable, would that help bridge the gap to production? Would this be more trustworthy?
We’re trying to understand whether this kind of approach would actually be useful in real-world agent implementations, and if so, for what kinds of decisions or workflows.
Would really appreciate hearing from anyone who’s been working on agent-based systems:
-What have you built?
-Have you shipped anything to production?
-What’s been hardest about that process?
-Where do you think determinism, consistency, or explainability would matter most?
Not selling anything, as we’d have lots of work to do to make the product more developer friendly anyway, just want to know whether the idea has legs and to learn from people building agents.
Thanks in advance to anyone willing to share.
hammyhavoc•3h ago
With the amount of fucking around required trying to correct an LLM, you may as well just write the code to do your task properly.