LLM logs are crushing my application logging system. We recently launched AI features on our app and went from ~100mb/month of normal website logs to 3gb/month of llm conversation logs and growing. Our existing logging system was overwhelmed (queries timing out, etc), and costs started increasing. We’re considering how to re-architect our llm logs specifically so we can handle more users plus the increasing token use from things like reasoning models, tool calling, and multi-agent systems. I’m not selling any solutions here, genuinely curious what others are doing. Do you store them alongside APM logs? Dedicated LLM logging service? Build it yourself with open source tools?
Comments
barbazoo•10h ago
Lots of LLM observability tools out there. Lots of them integrate with model sdks or llm frameworks" and sends traces via OpenTelemetry to providers like Braintrust, that's the one I know. You shouldn't have to build all this yourself if you're willing to spend some money.
platypii•9h ago
We're willing to spend money, but I've had the "datadog billing problem" before where it starts reasonable and then grows to a non-trivial percent of saas budget, and then theres a scramble to refactor. Trying to get ahead of that as the LLM logs are MUCH larger that my APM logs.
barbazoo•8h ago
Then I'd try to integrate via a standard connector, OTel for example. Then the cost of switching is much lower. But yeah not sure myself how this will scale and how expensive or even useful it will be.
barbazoo•10h ago
platypii•9h ago
barbazoo•8h ago