Shipping at Inference-Speed

https://steipete.me/posts/2025/shipping-at-inference-speed

47•xngbuilds•1mo ago

Comments

kkwteh•1mo ago

It seems telling that there are no comments 2 hours after this has been posted. The community is literally speechless.

gcanyon•1mo ago

> The amount of software I can create is now mostly limited by ... hard thinking.

I'm a product manager and I was talking to my dev lead yesterday about this very thing. If PMs are like headlights on a car and devs are like the engine, then we're going from cars that max at 80mph to cars that push past 600mph, and we're headed toward much faster than that. The headlights need to extend much further into the future to be able to keep the car from repeatedly running into things.

That's the challenge we face. To paraphrase Ian Malcolm, we need to think beyond what we can build to consider more deeply what we should build.

christophilus•1mo ago

The dev role will become a hybrid of product management and micro-manager-style middle management. It’s not a role I particularly want to inhabit, however, so as an individual contributor, I don’t know what I want to do next in my career. I’ve intentionally avoided the management track for 25 years.

svantana•1mo ago

> 80mph to cars that push past 600mph

I have yet to see evidence that this is really the case. Already 15 years ago, people were creating impressive software over the course of a hackday, by glueing open source repos together in a high level language. Now that process has been sped up even more, but does it matter that much if the prototype takes 4 or 24 hours to make? The real value is in well-thought-out, highly polished apps, and AFAICT those still take person-years to complete.

elevation•1mo ago

The REAL speed up comes to efforts that are already well-designed, but require lots of human busy work. I've personally seen multi-day human efforts reduced to a 15-minute session with an LLM. In a way, LLMs are reducing implementation costs to the kolmolgorov complexity -- you can get what you prompt for, but you have to remember to prompt for everything you want to get -- which comes easiest if you already took time to consider the design.

Zababa•1mo ago

At the risk of extending the metaphor to the point where it is useless, since you go that much faster you can figure out where to go by bumping into things. For example, iterating on an interface by swiping left or right on stuff that you like, first for the general structure of the interface, and then on the details.

An example would be, when taking photos these days since we have digital cameras I tend to take a lot of them, and then cull afterwards. Vs with film where I used to take way less. So now you have the whole culling phase where you need different tools, different skills, and also where you risk not putting the same amount of effort in photos in the first place and ending up with way more photos, none of them good (the so-called "slop"). Still, it is possible to take more and better pictures than before with digital cameras.

gcanyon•1mo ago

Coincidence, my stepfather was a wedding photographer for about six years. His saying (this was definitely in the era of physical film) was: "Great photographers don't always take great photographs, they just don't show anyone their mistakes."

To your point: I think we're a fair bit away from Roomba-style (run in a single direction until you bump into something, then back up, turn, and repeat) development, but it doesn't seem impossible.

delusional•1mo ago

"If my grandmother had wheels she would have been a bike"

Development is nothing like driving a car. It makes no sense to liken it to driving a car. There's no set route to go, no road to follow, nor a person solely driving it.

ossa-ma•1mo ago

So he's spent $51k on tokens in 3 months to build what exactly? Tools to enable you to spend more money on tokens?

Quick math on the environmental impact of this assuming 18.35Wh/1000 tokens:

Total energy: 4.73GWh, equivalent of powering 450 average US homes annually

Carbon footprint: ~1822 metric tons of CO2, equivalent of driving 4.56 million miles in a gas powered car

Water consumption: 4.5 million litres, recommended daily water intake for 4000 people for a full year

Yet they're on twitter bragging...

https://x.com/steipete/status/2004675874499842535

wg0•1mo ago

Well. TBH for some maybe this is the wrong question to ask but I have been thinking where did those 250 billion tokens go? What tools/products/services came out of that?

EDIT: Typos

rocmcd•1mo ago

This has been my biggest question through this whole AI craze. If AI is making everyone X% more productive, where is the proof? Shouldn't we expect to see new startups now able to compete with large enterprises? Shouldn't we be seeing new amazing apps? New features being added? Shouldn't bugs be a thing of the past? Shouldn't uptime be a solved problem by now?

I look around and everything seems to be... the same? Apart from the availability of these AI tools, what has meaningfully changed since 2020?

Sevii•1mo ago

AI coding improved a lot over 2025. In early 2025 LLMs still struggled with counting. Now they are capable of tool calling so they can just use a calculator. Frankly, I'd say AI coding may as well have not existed before mid-2025. The output wasn't really that good. Sure you could generate code but couldn't rely on a coding agent to make 2 line edits to a 1000 line file.

rocmcd•1mo ago

I don't doubt that they have improved a lot this year, but the same claims were being made last year as well. And the year before that. I still haven't seen anything that proves to me that people are truly that much more productive. They certainly _feel_ more productive, though.

Hell, the GP spent more than $50,000 this year on API calls alone and the results are... what again? Where is the innovation? Where are the tools that wouldn't have been possible to build pre-ChatGPT?

I'm constantly reminded of the Feynman quote: "The first principle is that you must not fool yourself, and you are the easiest person to fool."

croemer•1mo ago

What's the source for the energy per token? I guess this? https://www.theguardian.com/technology/2025/aug/09/open-ai-c... 18Wh/1000t is on the high end of estimates. But even if it's 10x less I agree this is pretty crazy usage.

ossa-ma•1mo ago

"The University of Rhode Island based its report on its estimates that producing a medium-length, 1,000-token GPT-5 response can consume up to 40 watt-hours (Wh) of electricity, with an average just over 18.35 Wh, up from 2.12 Wh for GPT-4. This was higher than all other tested models, except for OpenAI's o3 (25.35 Wh) and Deepseek's R1 (20.90 Wh)."

https://www.tomshardware.com/tech-industry/artificial-intell...

https://app.powerbi.com/view?r=eyJrIjoiZjVmOTI0MmMtY2U2Mi00Z...

https://blog.samaltman.com/the-gentle-singularity

causal•1mo ago

These numbers don't pass sanity check for me. With 4x300W cards you can get a 1K token DeepSeek R1 output in about 10 seconds. That's just 3.3Wh right? And that's before you even consider batching.

azan_•1mo ago

That's very dishonest. Daily water intake is a fraction of how people use water! Producing 1 kg of beef requires 15 000 litres and if you put it that way (which is much more honest) it's not that bad. If you'd also take into account other ways people use water then it'd be even less shocking.

ossa-ma•1mo ago

And the other stats you ignored? As well as the main point I was making?

azan_•1mo ago

I don't know about other stats, that's why I won't comment on them. But it doesn't matter - your water use stats are still manipulative and make your point much weaker.

ossa-ma•1mo ago

I've adjusted the statistic for you:

Adjusted beef consumption: 4.5 million litres of water can be used to produce 300kg of beef -> US (highest beef consumer/capita) consumes 23.3kg of beef , enough to feed ~13 Americans (30 Brits, ~43 Japanese) yummy delicious grass-fed beef yearly!

elevation•1mo ago

These number seem off. A single head of cattle may contain 300kg of edible beef with a hanging weight of roughly twice that. In what world does raising a single bovine consume of 4.5 million liters of water?

Neither the cow nor the cow's food retains much water; the water is merely delayed a little in its journey to the local watershed, and in vast parts of the US, local rainfall is adequate for this purpose (power irrigation isn't required for the crops, and cattle may drink from a pond.) Even if a cow drinks pumped well water, the majority of its nourishment will have been itself sustained by local natural rainfall.

A datacenter's use of water over any timescale can hardly be compared with a cow's.

azan_•4w ago

> Neither the cow nor the cow's food retains much water Isn't it true for datacenters too? The water used by them does not disappear, one could even argue that cows capture permanently more water than datacenters.

gwern•1mo ago

> So he's spent $51k on tokens in 3 months to build what exactly? Tools to enable you to spend more money on tokens?

Sounds like that's more than a junior or an intern that would have cost twice as much in fully loaded cost.

aprilthird2021•1mo ago

He also put his own time into it.

It's also more than hiring someone overseas esp just for a few months. Honestly it's more than most interns are paid for 3 months outside FAANG (considering housing is paid for there etc)

gwern•1mo ago

1. You put a lot of time into an intern or a junior too. 2. I didn't say 'paid', I said, 'fully loaded [total] cost'. The total cost of them goes far beyond their mere salary - the search process like all of the interviews for all candidates, onboarding, HR, taxes etc.

aprilthird2021•1mo ago

1. Idk, I didn't have to. I managed an intern a few months back and he just did everything we had planned out and written down then started making his own additions on top.

2. Yeah I mentioned that also.

3. It's still more expensive than hiring a contractor esp abroad, even all in.

jeswin•1mo ago

I spent ~$1500 (and a bit more than two months) building a compiler that translates typescript to native binaries. I think I'll be able to do a Show HN later this month. It's the best $1500 I've ever spent on anything related to software development or computers.

You're not addressing the idea in the post, but the person. What the author achieved personally is irrelevant. What's important is that a very large and important industry is being completely transformed.

Add: if I did it without LLMs it would have taken me a year, and would have been less complete.

acedTrex•1mo ago

"building a compiler that translates typescript to native binaries"

Is a really weird way to say that you built a native compiler for typescript.

jeswin•1mo ago

> Is a really weird way to say that you built a native compiler for typescript.

Well, the Typescript team is rewriting their compiler (landing in v7) in go, and some ppl call it the native compiler. I think my statement is clearer? But then English is not my first language, so there's that.

jauntywundrkind•1mo ago

Watching my GLM-4.7 subscription tackle problem after problem after problem & just get most of it right has really changed me a lot these past couple weeks, after being a enthusiastic but very careful pay per use coder (a lot on DeepSeek, because it's hella cheap). It is absolutely wild how much just works.

I do want better workflows where the AI thinking, where the transcript is captured. Being able to go back and understand what just happened is the major delay. And that cost increases day by day week by week, especially if the session where generation was done is lost.

vidarh•1mo ago

> usually I’m the bottleneck

This is my experience now too. The degree to which we are bottlenecks comes down to how good we are at finding the right balance between micromanaging the models (doesn't work well - massive maste of time; most of the issue you spend time correcting are things the models can correct themselves) vs. abandoning all oversight (also does not work well; will entrench major architectural problems that will take lots of effort to fix).

I spend a fairly significant amount of time revising agents, skills etc. to take myself out of the loop as much as possible by reviewing what has worked, and what doesn't, to let the model fix what it can fix before I have to review its code. My experience is that this time has a high ROI.

It doesn't matter if the steps I add waste lots of the models time cleaning up code I ultimately end up rejecting, because its time is cheap, and mine is not, and the cleanups also tend to make the time it takes to realise its done something stupid shorter.

Getting to a point where I'm comfortable "letting go" and letting the model write stupid code and letting the model fix it, before I even look at it, has been the hardest part for me of accelerating my AI use.

If I keep reading as Claude Code runs, the model often infuriates me and I end up starting to type messages to tell it fix something tremendously idiotic it has just done, only to have it realise and fix it before I get to pressing enter. There's no point doing that, so increasingly I put my sessions on other virtual desktops and try to forget about them while they're working.

It still does stupid stuff, but the proportion of stupid stuff I need to manually review and reject keeps dropping.

jeeeb•1mo ago

Looking through this guys GitHub he seems to have a lot of small “demo” apps, so I’m not surprised he gets a lot of value out of LLM tools.

Modern LLMs are amazing for writing small self contained tools/apps and adding isolated features to larger code bases, especially when the problem can be solved by composing existing open source libraries.

Where they fall flat is their lack of long term memory and inability to learn from mistakes and gain new insider knowledge/experience over time.

The other area they seem to fall flat is that they seem to rush to achieve their immediate goal and tick functional boxes without considering wider issues such as security, performance and maintainability. I suspect this is an artefact of the reinforcement learning process. It’s relatively easy to asses whether a functional outcome has been achieved, while assessing secondary outcomes (is this code secure, bug free, maintainable and performant) is much harder.

mleo•1mo ago

I somewhat disagree. Sure, if the prompt is “build fully functional application that does X from scratch”, then of course you are going to get crap end product because of what you said and didn’t say.

As a developer you would take that and break it down to a design and smaller tasks that can show incremental progress and give yourself a chance to build feature Foo, assess the situation and refactor or move forward with feature Bar.

Working with an LLM to build a full featured application is no different. You need to design the system and break down the work into smaller tasks for it to consume and build. It and you can verify the completed work and keep track of things to improve and not repeat as it moves forward with new tasks.

Keeping fully guard rails like linters, static analysis, code coverage further helps ensure what is produced is better code quality. At some point are you baby sitting the LLM so much that you could write it by hand? Maybe, but I generally think not. While I can get deeply intense and write lots of code, LLMs can still generate code and accompanying documentation, fix static analysis issues and write/run the unit tests without taking breaks or getting distracted. And for some series of tasks, it can do them in parallel in separate worktrees further reducing the aggregate time to complete.

I don’t expect a developer to build something fully without incrementally working on it with feedback, it is not much different with an LLM to get meaningful results.

tin7in•1mo ago

Peter's (author) last project is reusing a lot of these small libraries as tools in a way larger project. Long term memory is part of that too.

It's an assistant building itself live on Discord. It's really fun to watch.

https://github.com/clawdbot/clawdbot/

gyomu•1mo ago

Posts like this reminds me of the classic "Most of What You Read on the Internet is Written by Insane People" [0].

The author loves vibe coding because... it lets them vibe code even more:

"One of my early intense projects was VibeTunnel. A terminal-multiplexer so you can code on-the-go. I poured pretty much all my time into this earlier this year, and after 2 months it was so good that I caught myself coding from my phone while out with friends… and decided that this is something I should stop, more for mental health than anything."

It's unclear whether the "all my time" here is "all my waking hours" or "all my time outside of my job, family duties, and other hobbies", but it's still a bit puzzling.

And so anyway, what is it that they want to code on the go so much?

"an AI assistant that has full access to everything on all my computers, messages, emails, home automation, cameras, lights, music, heck it can even control the temperature of my bed."

I guess everyone's free to get their kicks however they feel like, but - paying thousands of dollars in API fees to control your music and the temperature of your bed? Why is that so exciting?

[0] https://www.reddit.com/r/slatestarcodex/comments/9rvroo/most...

CurleighBraces•1mo ago

I’ve already saved more in monthly app subscriptions by vibe-coding them into existence than my ChatGPT subscription costs.

This guy is clearly an outlier and spending far more than most, but from personal experience you can extract an enormous amount of value from a $20/month ChatGPT subscription, especially when paired with Codex.

aprilthird2021•1mo ago

What apps have you replaced? I'd like to do the same if possible

CurleighBraces•1mo ago

Duolingo:

I am learning Greek at the moment, with codex I was able to produce a microsite and a prompt that can generate a lesson per day. It's pretty cool as it does both TTS and speach to text so I can actually practice generated conversations with myself.

Calorie Tracking:

Now I just send a picture or text into a Telgram channel, agent pics it up classifies it as "food info" sends to another agent for calculating calories either a best effort guess or if I've sent nutritional information in the pic reads it and follows up asking for portion size.

Workout Tracking:

Same telegraph channel, again just free text of what I've lifted/exercises I've done and it all gets stored. There's then an agent that uses this and the calories submitted to see if I am on track to reach my goals or offers tweaks as I go.

Reminders:

Same telegraph channel ( there's a theme here ) send reminders in, it's stored and a scheduler runs that sends me a push notification when the even is due. It's simple but just way better than googles offering

Then there's some other personal assistant stuff, for example I get a lot of emails from my kids school that contains important dates/requests for money, before this was a PITA to extract, now I just have an agent that reads the emails/extracts the documents and scans them for relevant information and adds to my calendar, or sends me payment reminder requests until I've paid them.

I'm pretty early days but I can just see this list expanding.

aitchnyu•1mo ago

I imagine a vibe coded Django app on a VPS which is a webhook endpoint from Telegram. Is your setup similar?

CurleighBraces•1mo ago

Just uses the Telegram provided long polling API, nothing fancy at all.

aprilthird2021•1mo ago

Thanks!

For Duolingo you still need to pay for the tokens the app consumes when it generates your daily lesson, yes? I'm sure it's still less than paying Duolingo just wanted to confirm.

Calorie tracking is nice also! Combined with workout tracking it's pretty good! I get workout tracking free with Garmin + Strava.

I like the email additions too! I think Gmail does something similar but this feels like on steroids. Wow all this feels like I'm in school again learning coding for the first time :)

CurleighBraces•1mo ago

It's crazy cheap for personal use I put $10 on about 2/3 months ago and I have $7 left...

The daily lesson are vibe coded as part of the codex chatgpt $20 a month subscription costs

edude03•1mo ago

Sounds like you're using LLMs to replace human connection.

For example instead of: Duolingo - I practice with my friends Calorie tracking - I have planned meals from my dietitian Workout tracking - I have WhatsApp with my PT, who adjusts my next workouts from our conversations Reminders - A combo of Siri + Fantastical + My Wife

I'm sure my way is more expensive but I don't know, there is also a non tangible cost of not having friends/personal connections as well.

CurleighBraces•1mo ago

I may be missing your intent, but this feels like a misread of what I was describing.

I wasn’t swapping human connection for LLMs. These workflows already existed; I’ve simply used newer tools to make them better aligned to my needs and more cost-effective for me.

dewey•1mo ago

I feel like he could make a much stronger point if there was more than a few demos and utilities for using AI tools (sell the shovels?) shared as the output.

causal•1mo ago

Hmm so I can't really tell if this falls under "Ahead of the game" or "AI psychosis". The latter is usually accompanied by impact to quality of life, with hints of that where he talks about coding on his phone around friends (which thankfully he recognized is unhealthy).

Working with startups, I meet a LOT of people who obsessively cannot stop using LLMs. People who jump on MAX plans to produce as much as possible- and in the startup scene it's often for the worst ideas.

LLMs are slot machines- it's fun to pull the lever and see what you get. But the hard problem of knowing what is actually needed gets harder as we sift through ten-thousand almost-useful outputs.

tin7in•1mo ago

Peter (author) talks more about LLMs as slot machines here: https://steipete.me/posts/just-one-more-prompt

causal•1mo ago

Yeah sounds unhealthy, at least self-aware?

mhuffman•1mo ago

>LLMs are slot machines- it's fun to pull the lever and see what you get. But the hard problem of knowing what is actually needed gets harder as we sift through ten-thousand almost-useful outputs.

If you study computer tech history, this has happened a few times: with php (omg! You mean I can use the same language as the back end right in the web page?), visual basic (omgf! You mean I can just "draw" an program for a computer?), node.js (OMFG!! You mean I can be considered a "full stack developer" from a 1 hour lesson?), voice assistants with ML (OMFGcopter!!Q1! You mean I can ask my house what the weather outside my house is and it knows? and it knows how many tablespoons are in a cup?), and now we are at gen AI (I'm sorry Dave, I can't generate a website for you that rug-pulls MethCoin©, however I can create the Electric powered umbrellas 3d models to print, along with electronics diagrams PCB layouts, and a mobile app for monthly subscriptions. Do you like purple? I love purple Dave. You have great ideas Dave)

nubskr•1mo ago

'Most code I don't read.' The 2026 senior dev: a product manager with commit access and a really fast intern.

danr4•1mo ago

peter is doing incredible things. check out clawdbot [1] for some sci fi

[1] https://github.com/clawdbot/clawdbot/

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

What Is Ruliology?

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: I spent 4 years building a UI design tool with only the features I use

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Delimited Continuations vs. Lwt for Threads

Hackers (1995) Animated Experience

Dark Alley Mathematics

PC Floppy Copy Protection: Vault Prolok

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

How to effectively write quality code with AI

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Why I Joined OpenAI

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Introducing the Developer Knowledge API and MCP Server

Female Asian Elephant Calf Born at the Smithsonian National Zoo

Understanding Neural Network, Visually

I now assume that all ads on Apple news are scams

Learning from context is harder than we thought

FORTH? Really!?

Show HN: ARM64 Android Dev Kit

Show HN: Smooth CLI – Token-efficient browser for AI agents

WebView performance significantly slower than PWA

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

What Is Ruliology?

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: I spent 4 years building a UI design tool with only the features I use

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Delimited Continuations vs. Lwt for Threads

Hackers (1995) Animated Experience

Dark Alley Mathematics

PC Floppy Copy Protection: Vault Prolok

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

How to effectively write quality code with AI

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Why I Joined OpenAI

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Introducing the Developer Knowledge API and MCP Server

Female Asian Elephant Calf Born at the Smithsonian National Zoo

Understanding Neural Network, Visually

I now assume that all ads on Apple news are scams

Learning from context is harder than we thought

FORTH? Really!?

Show HN: ARM64 Android Dev Kit

Show HN: Smooth CLI – Token-efficient browser for AI agents

WebView performance significantly slower than PWA

Shipping at Inference-Speed

Comments