frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Expensively Quadratic: The LLM Agent Cost Curve

https://blog.exe.dev/expensively-quadratic
19•luu•3d ago

Comments

stuxf•1h ago
> Some coding agents (Shelley included!) refuse to return a large tool output back to the agent after some threshold. This is a mistake: it's going to read the whole file, and it may as well do it in one call rather than five.

disagree with this: IMO the primary reason that these still need to exist is for when the agent messes up (e.g reads a file that is too large like a bundle file), or when you run a grep command in a large codebase and end up hitting way too many files, overloading context.

Otherwise lots of interesting stuff in this article! Having a precise calculator was very useful for the idea of how many things we should be putting into an agent loop to get a cost optimum (and not just a performance optimum) for our tasks, which is something that's been pretty underserved.

Areena_28•1h ago
Classic trap! Reminds me of accidental quadraticism in Python list comprehensions. Have you benchmarked Rust's iterators vs. JS for these cases?
jauntywundrkind•1h ago
Very awesome to see these numbers, to see this explored so. Nice job exe.dev.
TZubiri•39m ago
I'm not sure, but I think that cached read costs are not the most accurately priced, if you consider your costs to be costs when consuming an API endpoint, then the answer will be 50k tokens, sure. But if you consider how much it costs the provider, cached tokens probably have a way higher margin than (the probably negative margin of ) input and output inference tokens.

Most caching is done without hints from the application at this point, but I think some APIs are starting to take hints or explicit controls for keeping state associated with specific input tokens in memory, so these costs will go down, in essence you really don't reprocess the input token at inference, if you own the hardware it's quite trivial to infer one output token at a time, there's no additional cost, if you have 50k input tokens, and you generate 1 output token, it's not like you have to "reinfer" the 50k input tokens before you output the second token.

To put it in simple terms, the time it takes to generate the Millionth output token is the same as the first output token.

This is relevant in an application I'm working on where I check the logprobs and not always choose the most likely token(for example by implementing a custom logit_bias mechanism client-side), so you can infer 1 output token at a time. This is not quite possible with most APIs, but if you control the hardware and use (virtually) 0 cost cached tokens, you can do it.

So bottomline, cached input tokens are almost virtually free naturally (unless you hold them for a loong period of time), the price of cached input APIs is probably due to the lack of API negotiation as to what inputs you want to cache. As APIs and self-hosted solutions evolve, we will likely see the cost of cached inputs masssively drop down to almost 0. With efficient application programming the only accounting should be for output tokens and system prompts. Your output tokens shouldn't be charged again as inputs, at least not more than once.

eshaham78•34m ago
This matches my experience running coding agents at scale. The cached token pricing is indeed somewhat artificial - in practice, for agent workflows with repeated context (like reading the same codebase across multiple tasks), you can achieve near-zero input costs through strategic caching. The real cost optimization isn't just token pricing but minimizing the total tokens flowing through the loop through better tool design.

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

https://mastodon.world/@knowmadd/116072773118828295
361•novemp•2h ago•255 comments

I’m joining OpenAI

https://steipete.me/posts/2026/openclaw
962•mfiguiere•11h ago•667 comments

Building SQLite with a small swarm

https://kiankyars.github.io/machine_learning/2026/02/12/sqlite.html
45•kyars•3h ago•18 comments

picol: A Tcl interpreter in 500 lines of code

https://github.com/antirez/picol
10•tosh•1h ago•8 comments

Magnus Carlsen Wins the Freestyle (Chess960) World Championship

https://www.fide.com/magnus-carlsen-wins-2026-fide-freestyle-world-championship/
266•prophylaxis•10h ago•154 comments

1,300-year-old world chronicle unearthed in Sinai

https://www.heritagedaily.com/2026/02/1300-year-old-world-chronicle-unearthed-in-sinai/156948
26•telotortium•4d ago•3 comments

Expensively Quadratic: The LLM Agent Cost Curve

https://blog.exe.dev/expensively-quadratic
19•luu•3d ago•5 comments

Arm wants a bigger slice of the chip business

https://www.economist.com/business/2026/02/12/arm-wants-a-bigger-slice-of-the-chip-business
68•andsoitis•6h ago•40 comments

Modern CSS Code Snippets: Stop writing CSS like it's 2015

https://modern-css.com
436•eustoria•15h ago•164 comments

Lost Soviet Moon Lander May Have Been Found

https://www.nytimes.com/2026/02/10/science/luna-9-moon-lander-soviet.html
32•Brajeshwar•4d ago•12 comments

LT6502: A 6502-based homebrew laptop

https://github.com/TechPaula/LT6502
349•classichasclass•15h ago•157 comments

Audio is the one area small labs are winning

https://www.amplifypartners.com/blog-posts/arming-the-rebels-with-gpus-gradium-kyutai-and-audio-ai
189•rocauc•3d ago•36 comments

I gave Claude access to my pen plotter

https://harmonique.one/posts/i-gave-claude-access-to-my-pen-plotter
172•futurecat•2d ago•94 comments

Show HN: Solving Sudoku reasoning via Energy Geometric models

https://www.davisgeometric.com/index.html
7•epokh•3d ago•1 comments

JavaScript-heavy approaches are not compatible with long-term performance goals

https://sgom.es/posts/2026-02-13-js-heavy-approaches-are-not-compatible-with-long-term-performanc...
68•luu•8h ago•64 comments

Databases should contain their own Metadata – Use SQL Everywhere

https://floedb.ai/blog/databases-should-contain-their-own-metadata-instrumentation-in-floe
21•matheusalmeida•4d ago•7 comments

Show HN: Microgpt is a GPT you can visualize in the browser

https://microgpt.boratto.ca
172•b44•14h ago•13 comments

EU bans the destruction of unsold apparel, clothing, accessories and footwear

https://environment.ec.europa.eu/news/new-eu-rules-stop-destruction-unsold-clothes-and-shoes-2026...
951•giuliomagnifico•15h ago•651 comments

Error payloads in Zig

https://srcreigh.ca/posts/error-payloads-in-zig/
70•srcreigh•9h ago•26 comments

Pocketblue – Fedora Atomic for mobile devices

https://github.com/pocketblue/pocketblue
97•nikodunk•16h ago•17 comments

Real-time PathTracing with global illumination in WebGL

https://erichlof.github.io/THREE.js-PathTracing-Renderer/
153•tobr•3d ago•14 comments

I Love Board Games: A Personal Obsession Explained by Psychology

https://www.thesswnetwork.com/post/why-i-love-board-games-a-personal-obsession-explained-by-psych...
47•Propolice•4d ago•32 comments

Gwtar: A static efficient single-file HTML format

https://gwern.net/gwtar
224•theblazehen•17h ago•74 comments

How long do job postings stay open?

https://corvi.careers/blog/job_open_days_by_category_feb_2026/
26•sp1982•1d ago•32 comments

GNU Pies – Program Invocation and Execution Supervisor

https://www.gnu.org.ua/software/pies/
83•smartmic•12h ago•53 comments

Radio host David Greene says Google's NotebookLM tool stole his voice

https://www.washingtonpost.com/technology/2026/02/15/david-greene-google-ai-podcast/
146•mikhael•15h ago•87 comments

Show HN: Knock-Knock.net – Visualizing the bots knocking on my server's door

https://knock-knock.net
136•djkurlander•15h ago•58 comments

Amazon's Ring and Google's Nest reveal the severity of U.S. surveillance state

https://greenwald.substack.com/p/amazons-ring-and-googles-nest-unwittingly
820•mikece•20h ago•575 comments

Transforming a Clojure Database into a Library with GraalVM Native Image and FFI

https://avelino.run/chrondb-polyglot-ffi-clojure-graalvm-native-image/
46•PaulHoule•4d ago•2 comments

Editor's Note: Retraction of article containing fabricated quotations

https://arstechnica.com/staff/2026/02/editors-note-retraction-of-article-containing-fabricated-qu...
232•bikenaga•14h ago•162 comments