frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: Why isn't using AI in production considered stupid?

6•spl757•27m ago•3 comments

Ask HN: Founders of estonian e-businesses – is it worth it?

161•udl•4d ago•101 comments

Ask HN: How are you keeping AI coding agents from burning money?

3•bhaviav100•6h ago•8 comments

The risk of AI isn't making us lazy, but making "lazy" look productive

65•acmerfight•16h ago•68 comments

Ask HN: What's the latest concensus on OpenAI vs. Anthropic $20/month tier?

5•whatarethembits•7h ago•2 comments

Ask HN: Is it just me?

10•twoelf•13h ago•13 comments

I built an AI that tailors your CV to every job in seconds

2•alebarbon•16h ago•1 comments

Ask HN: Anyone using Meshtastic/LoRa for non-chat applications?

11•redgridtactical•1d ago•0 comments

Claude API Error: 529

25•anujbans•1d ago•13 comments

Ask HN: Anybody tried to cheat AI-HR-system with hidden/white sentences?

4•KellyCriterion•23h ago•3 comments

Repsy – A lightweight, open-source alternative to Nexus/Artifactory

6•nuricanozturk•1d ago•0 comments

GitHub has been sending me an email every two seconds.

15•colonelspace•1d ago•3 comments

Fear of Missing Code

5•lukol•1d ago•7 comments

LLMs learn what programmers create, not how programmers work

41•noemit•5d ago•20 comments

Tell HN: Pangram is easily-defeatable with Claude

3•nunez•1d ago•4 comments

GitHub now requiring 2FA for all contributors,what authenticator apps you using?

13•nickcageinacage•2d ago•38 comments

Ask HN: What do you use for normative specs to drive AI agents?

4•midnight_eclair•1d ago•4 comments

Rses – cross-resume between Claude Code, Codex, and OpenCode

13•plawlost•2d ago•5 comments

Lazy Tmux – Lazy-loading tmux sessions with a tree view

8•Alchemmist•3d ago•1 comments

Ask HN: How do you deal with obvious AI assistant usage in interviews

9•stackdestroyer•2d ago•19 comments

Ask HN: Is anyone here also developing "perpetual AI psychosis" like Karpathy?

32•jawerty•5d ago•29 comments

Ask HN: Is using AI tooling for a PhD literature review dishonest?

10•latand6•5d ago•29 comments

Ask HN: Is Antigravity code search dropping results recently?

7•sankalpnarula•3d ago•0 comments

You've reached the end!

Open in hackernews

Ask HN: How are you keeping AI coding agents from burning money?

3•bhaviav100•6h ago
My agents retry a bit more than it should, and there goes my bill up in the sky. I tried figuring out what is causing this but none of the tools helped much.

and the worse thing for me is that everything shows up as aggregate usage. Total tokens, total cost, maybe per model.

So I ended up hacking together a thin layer in front of OpenAI where every request is forced to carry some context (agent, task, user, team), and then just logging and calculating cost per call and putting some basic limits on top so you can actually block something if it starts going off the rails. It’s very barebones, but even just seeing “this agent + this task = this cost” was a big relief.

It uses your own OpenAI key, so it’s not doing anything magical on the execution side, just observing and enforcing.

I want to know you guys are dealing with this right now. Are you just watching aggregate usage and trusting it, or have you built something to break it down per agent / task?

If useful, here is the rough version I’m using : https://authority.bhaviavelayudhan.com/

Comments

rox_kd•6h ago
In what settings do you mean - there are multiple strategies, I think building your own compaction layer in front seems a bit over-kill ? have you considered implementing some cache strategy, otherwise summary pipelines - I made once an agent which based on the messages routed things to a smaller model for compaction / summaries to bring down the context, for the main agent.

But also ensuring you start new fresh context threads, instead of banging through a single one untill your whole feature is done .. working in small atomic incrementals works pretty good

bhaviav100•5h ago
yes, compaction and smaller models help on cost per step.

But my issue wasn’t just inefficiency, it was agents retrying when they shouldn’t.

I needed visibility + limits per agent/task, and the ability to cut it off, not just optimize it.

DarthCeltic85•5h ago
I had gotten a student/ultra code for antigravity promo for three months, so I was using that, but that finally ran out this month. Currently Im using windstream and flipping between claude as my left brain and code extraction and the higher context but cheaperish models there.

honestly though, im getting to a point where im running custom project mds that flip between different models for different things, using list outputs depending on what it finds and runs. (I have two monorepo projects, and one thats a polyglot microengine that jumps using gRPC communication.)

The mds are highly specialized for each project as each project deals with vastly different issues. Cycling through the different pro accounts and keeping the mds in place over it all is helping me not kill my wallet.

bhaviav100•5h ago
hmm interesting model routing + specialized MDs makes sense for cost efficiency.

I’m seeing a different failure mode though that even with good routing, agents are looping or retrying and burning my money.

maxbeech•1h ago
the retry loop problem is distinct from the cost-per-token problem and most tooling conflates them.what's helped: track wall-clock time per meaningful step, not turn count. if the agent hasn't produced a file change or git diff in N minutes, it's stuck - not thinking. token count just tells you it's running; it doesn't tell you it's looping.the other piece: watchdog lives outside the agent process so a hung agent can't block its own shutdown. a hung agent shouldn't be able to veto its own termination.i built something for this actually - openhelm.ai schedules and runs claude code jobs with per-run timeouts and status checkpoints. the checkpoint pattern (agent writes state at each major step) is what gives you "where did it get stuck" not just "it failed at some point".
jerome_mc•1h ago
AI outputs often feel like a gacha game. Paradoxically, the 'expensive' tokens are sometimes the cheapest in the long run. In my experience, higher-end models have a much higher 'one-shot' success rate. You aren't just saving on total token count by avoiding loops; you’re saving engineering time, which is always the most expensive resource anyway.
spl757•47m ago
By not using it. The tech is flawed. It hallucinates. It's not production ready. I've said it before, and I will say it again. Anyone using AI in a production environment is a fucking idiot.
spl757•43m ago
Don't use tech with deep, unresolved flaws and you won't get fucked.

Would you find it acceptable if Postgresql occassionally hallucinated and returned gibberish? Fuck no.

Wny is this okay with ANY software? Answer, it's not. AI IS NOT READY.