frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: How are you controlling costs and enforcing limits for LLM calls?

3•8dazo•21h ago
I’ve been running into an issue with LLM/agent systems where unexpected loops or repeated calls can quickly drive up costs.

Most tools I’ve seen focus on observability (logs, traces, dashboards), but not actual enforcement at runtime.

Curious how people here are handling this in production:

- Are you enforcing hard limits (budget, rate, etc.) or just monitoring?

- Do you handle this at the app level or via some middleware/proxy?

- Have you built something in-house for this?

Feels like an unsolved problem, especially with agents.

Would love to hear how others are dealing with it.

Comments

jackycufe•19h ago
Certainly. I use LiteLLM to get more cache and save more money
brandonharwood•13h ago
It’s a bit of a chicken & egg thing and depends a ton on how LLM is applied within an app. I always start at the core design of the integration and focus hard on the problem it solves. Why are you using an LLM in the first place? What is/are the function/s it needs to perform in the context of the user interaction? These are the kind of questions that help you understand the constraints you need to implement. So for example; a project I’m working on is a diagramming tool, and I’m implementing an AI layer on top of it so users can refine/edit/generate diagrams. The tool creates maps structured into a JSON schema, but these can get really long, sometime s thousands of lines depending on the complexity of the diagram. Obviously feeding an entire diagram or having the AI generate an entire diagram is expensive here, so the fix was building a deterministic translation layer that compressed the diagram into a compact semantic model for the LLM, stripping visual noise (x/y coordinates), deduplicating relationships, resolving references etc.

With this and keeping the interact, we cut token usage by ~75% across the app. On the output side, the LLM only produces changes needed, not the full diagram. Layout, validation, and rendering are computed client-side for free so costs only scale with what the user asks for. With good UX as well, we can pay attention to what users ask for, and create “quick actions” that use the LLM within closed loop subsystems. Since we assign a credit system for AI tool usage, we’re better able to accurately assign credit costs to quick actions because each action has a defined scope.

TLDR: make the LLM do less, then put hard limits around the smaller set of things it’s allowed to do

Hybrid Attention

38•JohannaAlmeida•14h ago•8 comments

Ask HN: How do you handle marketing as a solo technical founder?

117•lazarkap•1d ago•88 comments

Ask HN: What are you working on? (April 2026) (Non AI)

7•cousin_it•9h ago•16 comments

Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw

1095•firloop•4d ago•827 comments

Free models you can use with your OpenClaw (no credit card needed)

2•stosssik•5h ago•0 comments

GPT 5.4 in practice – Stinks?

3•sjt-at-rev•6h ago•1 comments

Digital Ocean blocked my account

4•ugly_munchkin•6h ago•3 comments

Zooming UIs in 2026: Prezi, impress.js, and why I built something different

95•tinchox6•1d ago•43 comments

Claude Code limits are starting to feel like a psychological trick

5•trinsic2•1d ago•11 comments

Ask HN: Alternatives to Claude (Code)?

10•vixalien•1d ago•6 comments

Ask HN: How are you controlling costs and enforcing limits for LLM calls?

3•8dazo•21h ago•2 comments

Compact multi-port network setup (2.5G / 10G / SFP+) – looking for feedback

2•Qotom•21h ago•2 comments

Ask HN: Any Interesting Niche Hobbies?

18•e-topy•2d ago•20 comments

Ask HN: How are you orchestrating multi-agent AI workflows in production?

6•swrly•1d ago•3 comments

Ask HN: Learning resources for building AI agents?

12•7e10•3d ago•8 comments

Upwork Inc. violates its own DMARC and SPF policy

8•tmcdos•1d ago•7 comments

Ask HN: Where are all the disruptive software that AI promised?

20•p-o•2d ago•18 comments

Write Your Own Copy

31•operatingthetan•3d ago•17 comments

Microsoft Discontinuing Publisher. Alternatives?

17•supliminal•3d ago•15 comments

Ask HN: SoTA of Context Building Methods

5•h4ch1•1d ago•1 comments

Anthropic to limit Using third-party harnesses with Claude subscriptions

19•guiyuwei•4d ago•8 comments

Intelligence Cannot Be Trained?

9•hyperzzw•3d ago•5 comments

Ask HN: How Do You Relax?

22•azeirah•3d ago•26 comments

Third-party Claude harnesses will now draw from extra usage

15•iBelieve•4d ago•6 comments

Ask HN: I don't get why Anthropic is limiting usage

6•ud0•2d ago•6 comments

Ask HN: LLM-Based Spam Filter

4•michidk•2d ago•1 comments

Claude Code Down

83•theahura•1d ago•74 comments

Claude Peptides – slash commands to cut Claude Code token usage by 73%

5•alchaka•2d ago•3 comments

You've reached the end!