Given a context window of 128 kilotokens, and how the energy usage range (0.3 to 40 Wh) increases by a factor of 133 from "single query" to "worst case," this suggests linear scaling of energy usage with context length.
Given the common user pattern of "chat with the LLM" the 0.3 Wh figure seems reductive. As the chat grows in length, the length of the "query" becomes the length of the entire chat.
Assuming context compression for long chats, the figure we're looking for per-question energy use should be the energy use for a query half the length of max context tokens.
For GPT-4o: 20 Wh
https://www.timdavis.com/blog/scale-or-surrender-when-watts-...
The good news would be that GPT-4o average energy usage per query would be lower than 20 Wh.
The bad news is that there's a quadratic increase in energy usage with the increase in a model's maximum context window. GPT 3.5 -> GPT 4 was an increase from thousands of tokens to hundreds of thousands of tokens.
Is this information readily available somewhere?
"Maybe"
What about the definition of "AGI" being changed again to mean something else? What does "AGI" actually mean anymore or was it hijacked again for the purpose of manipulation?
But this is the most important sentence of the entire article:
> Tech leaders simply pay no cost for misprediction.
They don't care if they are wrong on their predictions. So focus on what they are doing rather than what they are saying.
ulfw•1d ago
notrealyme123•1d ago