Ask HN: How expensive are LLMs to query, really?

5•teach•1d ago

I'm starting to see things pop-up from well-meaning people worried about the environmental cost of large language models. Just yesterday I saw a meme on social media that suggested that "ChatGPT uses 1-3 bottles of water for cooling for every query you put into it."

This seems unlikely to me, but what is the truth?

I understand that _training_ an LLM is very very expensive. (Although so is spinning up a fab for a new CPU.) But it seems to me the incremental costs to query a model should be relatively low.

I'd love to see your back-of-the-envelope calculations for how much water and especially how much electricity it takes to "answer a single query" from, say, ChatGPT, Claude-3.7-Sonnet or Gemini Flash. Bonus points if you compare it to watching five minutes of a YouTube video or doing a Google search.

Links to sources would also be appreciated.

Comments

serendipty01•1d ago

Some links:

https://www.sustainabilitybynumbers.com/p/carbon-footprint-c...

https://andymasley.substack.com/p/a-cheat-sheet-for-conversa...

(discussion on lobste.rs - https://lobste.rs/s/bxixuu/cheat_sheet_for_why_using_chatgpt...)

(discussion on HN, 320 comments: https://news.ycombinator.com/item?id=42745847)

teach•1d ago

These are excellent, thank you!

a_conservative•1d ago

my m4max macbook can run local inference on a medium-ish gemini model (32b IIRC). The power consumption spikes by about 120 watts over idle (with multiple electron apps, docker, etc). It runs about 70 tokens/sec and usually responds within 10 to 20 seconds.

So.. picking some numbers for calculation. 4 answers per minute @ 120 watts is about .5 watt-hours per answer. ~200 responses would be enough to drain the (normally quite long lasting battery).

How does that compare to the more common nvidia GPUs? I don't know.

The Universe of Discourse: A puzzle about balancing test tubes in a centrifuge

High-School Shop Students Attract Skilled-Trades Job Offers

FUTOcore: A New Software Store

Everel single-blade propeller [1938, 2018]

Tell HN: I use AI to help me code, but I don't want to be called a "vibe coder"

Immunogenicity and Safety of Influenza and Covid-19 Multicomponent Vaccine

DNS Piracy Blocking Orders: Google, Cloudflare, and OpenDNS Respond Differently

HunyuanVideo-I2V: 14B model turns an image into 720p video on 8GB GPU

Authorization Code Flow for Server-Side Apps

Tech that defined the modern internet is changing, SV is finally admitting it

I'm becoming increasingly worried about AI (2017)

Trends in Educational Attainment in the U.S. Labor Force

First time founders are obsessed with product. 2nd time worry about distribution

Global emergence of unprecedented lifetime exposure to climate extremes

What Is Programming?

Plotting Truth vs. Predicted Value

Show HN: One-liner CLI for batched PDF-to-Markdown at $1 per ~6k pages

Feelings, Facts, and Our Crisis of Truth

A close reading of the AI fake cases judgement

Show HN: LLM Agents Play Among Us-Like Game

Family creates AI video to depict Arizona man addressing his killer in court

Antarctica's Astonishing Rebound: Ice Sheet Grows in Decades

MSG Is (Once Again) Back on the Table

How do builders solve the distribution problem in a world of daily launches?

Gonzalo Guerrero

Manage the most important part of resource to save time and improve productivity

Would a plug-and-play abuse protection toolkit be useful beyond Stripe Radar?

2025 will likely be another brutal year of failed startups, data suggests

Wearable continuous diffusion-based skin gas analysis

Incus: System container and virtual machine manager