State of AI: An Empirical 100T Token Study with OpenRouter

207•anjneymidha•2mo ago

Comments

themanmaran•2mo ago

> The metric reflects the proportion of all tokens served by reasoning models, not the share of "reasoning tokens" within model outputs.

I'd be interested in a clarification on the reasoning vs non-reasoning metric.

Does this mean the reasoning total is (input + reasoning + output) tokens? Or is it just (input + output).

Obviously the reasoning tokens would add a ton to the overall count. So it would be interesting to see it on an apples to apples comparison with non reasoning models.

reeeli•2mo ago

I'm out of time but "reasoning input tokens" from fortune 5000 engineers sounds like a lobotomized LSD dream, would you care on elaborating how you distinguish between reasoning and non-reasoning? vs "question on duty"?

typs•2mo ago

I believe they’re just classifying all models into “reasoning models” eg o3 vs “non reasoning models” eg 4o and just doing a comparison of total tokens (input tokens + hidden reasoning output tokens + shown output tokens)

maikakz•2mo ago

that's exactly right!

DIAexitNode•2mo ago

hell yeah, 109 out of 10 doors opened! 99 bonus doors! what are you talking about, man?

themanmaran•2mo ago

"reasoning" models like GPT 5 et al do a pre-generation step where they:

- Take in the user query (input tokens)

- Break that into a game plan. Ex: "Based on user query: {query} generate a plan of action." (reasoning tokens)

- Answer (output tokens)

Because the reasoning step runs in a loop until it's run through it's action plan, it frequently uses way more tokens than the input/output step.

reeeli•2mo ago

that was useful, thank you.

I have sooo many issues with the naming scheme of this """""AI"""" industry", it's crazy!

So the LLM gets a prompt, then creates a scheme to pull pre-weighted tokens post-user-phrasing, the constituents of which (the scheme) are called reasoning tokens, which it only explicitly distinguishes as such because there are hundreds or even thousands of output tokens to the hundreds and/or thousands of potential reasoning input tokens that were (almost) equal to the actually chosen reasoning input tokens based on the more or less adequately phrased question/prompt given ... as input ... by the user ...

IgorPartola•2mo ago

You can call them planning if you want or pre-planning. But I would encourage you to play with the API version of your model of choice to see exactly what this looks like. It’s kind of like a human’s internal monologue: “got an email from my boss asking to write unit tests for the analytics API. First I have to look at the implementation to know how exactly it actually functions, then write out what kinds of tests make sense, then implement the tests. I should write a TODO list of these steps.”

It is essentially a way to expand the prompt further. You can achieve the same exact thing by turning off the “thinking” feature and just being more detailed and step by step in your prompt but this is faster.

My guess is that the next evolution of this will be models that do an edit or review step after to catch if any of the constraints were broken. But best I can tell a reasoning model can be approximated by doing two passes of a non-reasoning model: first pass you give it the user prompt with instructions that boil down to “make sense of this prompt and formulate a plan” and the second pass you give it the original prompt, the plan, and an explanation that the plan is to implement the original prompt using the plan.

ribosometronome•2mo ago

As would models that that are overly verbose. My experience is the Claude tends to do more than is asked for (e.g. immediately move on to creating tests and documentation) while other models like Gemini tend to be more concise in what they do.

typs•2mo ago

This is really amazing data. Super interesting read

syspec•2mo ago

According to the report, 52% of all open-source AI is used for *roleplaying*. They attribute it to fewer content filters and higher creativity.

I'm pretty surprised by that, but I guess that also selects for people who would use openrouter

raincole•2mo ago

If you rely on AI to write most of your code (instead of using it like Stackoverflow), Claude Code/OpenAI Codex subscription are cheaper than buying tokens. So those users are not on openrouter.

djfergus•2mo ago

I'm curious what percentage of claude/codex users this is true for - I assumed their business models rely on this not being true for the majority.

bakugo•2mo ago

Both Claude Code and Codex steer you towards the monthly subscription. Last time I tried Codex, I remember several aspects of it being straight up broken if used with an API key instead of a subscription account.

The business model is likely built upon the assumption that most people aren't going to max out their limits every day, because if they were, it likely wouldn't be profitable.

UltraSane•2mo ago

I got $250 free Claude Code credit and I was surprised by how hard it was to actually use it all before it expired.

djfergus•2mo ago

Openrouter has an apps tab. If you look at the free, non-coding models, some apps that feature are: janitor.ai, sillytavern, chub.ai. I'd never heard of them but people seem to be burning millions of tokens enjoying them.

bakugo•2mo ago

> I guess that also selects for people who would use openrouter

It definitely does. OpenRouter is pretty popular among roleplayers and creative writers due to having a wide variety of models available, sometimes providing free access to quality models such as DeepSeek, and lacking any sort of rules against generating "adult" content.

IMTDb•2mo ago

Or maybe it’s just strange classification. I see a lot of prompts on the internet looking like “act as a senior xxx expert with over 15 years of industry experience and answer the following: [insert simple question]”

I hope those are not classified as “roleplaying” the “roleplay” here is just a trick to get better answer from the model, often in a professional setting that has nothing to do with creative writing of NSFW stuff

Windchaser•2mo ago

I strongly bet that this is it.

susanthenerd•2mo ago

OpenRouter classifies content by the app that's used to interact with the llm. https://openrouter.ai/docs/app-attribution

mike_hearn•2mo ago

This paper says they classify it by feeding tokens to a Google model.

sysguest•2mo ago

"act as a senior xxx expert with over 15 years of industry experience"

... I just don't get why LLMs are affected by this kind of nonsense -- is it due to training rewards?

mondojesus•2mo ago

The way I think about it, the training data (i.e. the internet) has X% of people asking something like "explain it to me like I'm five years old" and Y% of people framing it like "I'm technical, explain this to me in detail". You use the "act as a senior XXX" when you want to bias the output towards something more detailed.

est•2mo ago

imagine how to prepare for an interview? Act confident lol

CJefferson•2mo ago

I can't be sure, but this sounds entirely possible to me.

There are many, many people, and websites, dedicated to roleplaying, and those people will often have conversations lasting thousands of messages with different characters. I know a people whose personal 'roleplay AI' budget is a $1,000/month, as they want the best quality AIs.

KronisLV•2mo ago

Would be good to look into those particular statistics, then. Seems like the category could include all sorts of stuff:

> This indicates that users turn to open models primarily for creative interactive dialogues (such as storytelling, character roleplay, and gaming scenarios) and for coding-related tasks. The dominance of roleplay (hovering at more than 50% of all OSS tokens) underscores a use case where open models have an edge: they can be utilized for creativity and are often less constrained by content filters, making them attractive for fantasy or entertainment applications. Roleplay tasks require flexible responses, context retention, and emotional nuance - attributes that open models can deliver effectively without being heavily restricted by commercial safety or moderation layers. This makes them particularly appealing for communities experimenting with character-driven experiences, fan fiction, interactive games, and simulation environments.

I could imagine something like D&D or other types of narrative adventures on demand with a machine that never tires of exploring subplots or rewriting sections to be a bit different is a pretty cool thing to have. Either that, or writing fiction, albeit hopefully not entire slop books that are sold, but something to draw inspiration from and do a back and forth.

In regards to NSFW stuff, a while back people were clowning on OpenAI for suggesting that they'd provide adult writing content to adults, but it might as well be a bunch of money that's otherwise left on the table. Note: I'm all for personal freedom, though one also has to wonder about the longer term impact of those "AI girlfriend/boyfriend" trends, you sometimes see people making videos about those subreddits. Oh well, not my place to judge.

Edit: oh hey, there is more data there after all

> Among the highest-volume categories, roleplay stands out for its consistency and specialization. Nearly 60% of roleplay tokens fall under Games/Roleplaying Games, suggesting that users treat LLMs less as casual chatbots and more as structured roleplaying or character engines. This is further reinforced by the presence of Writers Resources (15.6%) and Adult content (15.4%), pointing to a blend of interactive fiction, scenario generation, and personal fantasy. Contrary to assumptions that roleplay is mostly informal dialogue, the data show a well-defined and replicable genre-based use case.

ceroxylon•2mo ago

That also stuck out for me, I was wondering if it was video games using openrouter for uptime / inference switching, video games would use a lot of tokens generating dialogue for a few programmer's villages.

cess11•2mo ago

Sex- and spambots are likely the most common applications of these things.

lm28469•2mo ago

I'm not surprised at all. The HN crowd think LLMs are mostly used for engineering because they live in a multi layer bubble. Real people in the real world do all kind of shit with LLMs which aren't productivity or even work related.

veunes•2mo ago

I'm not surprised. Roleplay means endless sessions with huge context (character history, world, previous dialogues). On commercial APIs (OpenAI/Anthropic), that long-context costs a fortune. On OpenRouter, many OSS models, especially via providers like DeepInfra or Fireworks, cost pennies or are even free, like some Free-tier models. The RP community is very price-sensitive, so they massively migrate to cheap OSS models via aggregators. It skews the stats but highlights a real niche for cheap inference

lukev•2mo ago

Super interesting data.

I do question this finding:

> the small model category as a whole is seeing its share of usage decline.

It's important to remember that this data is from OpenRouter... a API service. Small models are exactly those that can be self-hosted.

It could be the case that total small model usage has actually grown, but people are self-hosting rather than using an API. OpenRouter would not be in a position to determine this.

maikakz•2mo ago

Thank you & totally agree! The findings are purely observational through OpenRouter’s lens, so they naturally reflect usage on the platform, not the entire ecosystem.

mzl•2mo ago

While it is possible to self-host small models, it is not easy to host them with high speeds. Many small-model use-cases are for large batches of work (processing large amounts of documents, agentic workflows, ...), and then using a provider that has high tps numbers would be motivated.

Still, I agree that self-hosting is probably a part of the decrease.

YetAnotherNick•2mo ago

The bigger issue is that they count small based on fixed number of parameters, and not the active parameter for MoE, didn't account for any hardware improvements etc. If they counted small based on the price or computational cost, I think they would have seen increase in small models.

Alex-Programs•2mo ago

I think using total parameters is fair, it correlates well with the RAM prerequisites to run it. Otherwise Kimi K2 would be "small" despite being a trillion parameters!

YetAnotherNick•2mo ago

VRAM doesn't matter if you are using API. Price and performance is what matters.

veunes•2mo ago

Yeah, using an API aggregator to run a 7B model is economically strange if you have even a consumer GPU. OpenRouter captures the cream of complex requests (Claude 3.5, o1) that you can't run at home. But even for local hosting, medium models are starting to displace small ones because quantization lets you run them on accessible hardware, and the quality boost there is massive. So the "Medium is the new Small" trend likely holds true for the self-hosted segment as well.

asadm•2mo ago

Who is using grok code and why?

djfergus•2mo ago

It's a 1.7 trillion token free model. Why wouldn't you try it?

I've been testing free models for coding hobby projects after I burnt through way too many expensive tokens on Replit and Claude. Grok wasn't great, kept getting into loops for me. I had better results using KAT coder on opencode (also free).

verdverm•2mo ago

> Why wouldn't you try it?

Because the people behind it and myself having at least some standards

joshuamcginnis•2mo ago

According to https://openrouter.ai/rankings, lots of people are using it - presumably because it performs well and provides value.

bakugo•2mo ago

Kilo Code lets people use Grok Code Fast 1 for free, using OpenRouter as the provider. And Grok 4.1 Fast was completely free directly on OpenRouter for some time after its release.

So yeah, their statistics are inflated quite a bit, since most of that usage was not paid for, or at least not by the end user.

btbuildem•2mo ago

It was (is?) free with eg. opencode -- so, open-source coding agent + free sota model, it's hard to resist. That said, grok fast is fast, but not that great when compared to the other top tier models.

sosodev•2mo ago

The open weight model data is very interesting. I missed the release of Minimax M2. The benchmarks seem insanely impressive for its size. I would suspect benchmaxing but why would people be using it if it wasn’t useful?

pestaa•2mo ago

I used it for a couple of days and was very impressed. Definitely a hidden gem, and super cheap.

skywhopper•2mo ago

This is interesting, but I found it moderately disturbing that they spend a LOT of effort up front talking about how they don’t have any access to the prompts or responses. And then they reveal that they did actually have access to the text and they spend 80% of the rest of the paper analyzing the content.

charcircuit•2mo ago

>And then they reveal that they did actually have access to the text

I'm not seeing that. All I'm seeing is them analyzing metadata.

slack2450•2mo ago

>All I'm seeing is them analyzing metadata Read the section about how they achieve classifications for prompts (hint: They read the prompts)

Argonaut998•2mo ago

I didn’t read the paper but I know OR has an option to opt-in to reading/training off the prompts for a discount. Some free models also log, but I’m not sure if that is just the provider, or OR too

charcircuit•2mo ago

From what I see the researchers aren't running a classifier on prompts they've acquired.

>The classifier is deployed within OpenRouter's infrastructure, ensuring that classifications remain anonymous and are not linked to individual customers.

OpenRouter has to have access to your prompts in order to route it somewhere else. The researchers don't get access to these prompts. They only get access to the metadata being generated from routing a prompt.

majdalsado•2mo ago

Very interesting how Singapore ranks 2nd in terms of token volume. I wonder if this is potentially Chinese usage via VPN, or if Singaporean consumers and firms are dominating in AI adoption.

Also interesting how the 'roleplaying' category is so dominant, makes me wonder if Google's classifier sees a system prompt with "Act as a X" and classifies that as roleplay vs the specific industry the roleplay was intended to serve.

olalonde•2mo ago

Almost certainly VPN traffic. Most major LLMs block both China and Hong Kong (surprisingly, not the other way around), so Singapore ends up being the fastest nearby endpoint that isn't restricted.

m3h•2mo ago

Why do major LLMs block china? Isn't that a potentially huge market for them?

olalonde•2mo ago

I'm not sure, but my guess is that it's due to pressure (or perceived pressure) from the U.S. government.

orbital-decay•2mo ago

It's their own decisions they made long before the controls and presure. Besides being in bed with the US gov, people that run big AI shops tend to be fervently nationalistic and politically ambitious on their own. Leopold Aschenbrenner's dystopian rant [1] or Dario Amodei's [2] [3] are pretty representative.

[1] https://situational-awareness.ai/

[2] https://www.darioamodei.com/essay/machines-of-loving-grace

[3] https://www.darioamodei.com/post/on-deepseek-and-export-cont...

mike_hearn•2mo ago

Early on there was a lot of distillation going on, apparently. Note that OpenAI introduced ID verification for high volume accounts and I think it was for that reason. It does raise questions about how much of the Chinese model's performance is entirely home grown. At least historically, it was quite hard to crawl the English web from behind the Great Firewall.

slack2450•2mo ago

It’s not VPN traffic all data is aggregated by billing payment information so it’s Singaporean billing details.

olalonde•2mo ago

Ah, you're right. Still, I wonder if it's because of Chinese people and companies using Singaporean bank accounts. It just seems odd that such a small country is so overrepresented here.

m0rde•2mo ago

> The noticeable spike [~20 percentage points] in May in the figure above [tool invocations] was largely attributable to one sizable account whose activity briefly lifted overall volumes.

The fact that one account can have such a noticeable effect on token usage is kind of insane. And also raises the question of how much token usage is coming from just one or five or ten sizeable accounts.

nhaehnle•2mo ago

It is quite interesting to ponder these usage statistics, isn't it?

According to their charts they're at a throughput of something like 7T tok/week total now. At 1$/Mtok, that's 7M$ per week. Less than half a billion per year. How much is that compared to the total inference market? And yet again, their throughput went like 20x in one year, who knows what's to come...

mike_hearn•2mo ago

Yes, but that token growth chart looks linear to me. There's the usual summer slump and then growth catches up once the autumn begins, but if you plot a line from the winter growth period at the start of 2025 you end up roughly in the right place except for an unusual spike in the most recent month (maybe another big user).

I'd have liked to see a chart of all tokens broken down by category rather than just percentages, but what this data seems to be saying is that growth isn't exponential, and is being dominated by growth in programming. A lot of the spending in AI is being driven by the assumption that it'll be used for everything everywhere. Perhaps it's just OpenRouter's user base, but if this data is representative then it implies AI adoption isn't growing all that fast outside of the tech industry (especially as "science" is nearly all AI related discussion).

This feels intuitively likely. I haven't seen many obvious signs of AI adoption around me once I leave the office. Microsoft has been struggling to sell its Copilot offerings to ordinary MS Office users, who apparently aren't that keen. The big wins are going to be existing apps and data pipelines calling out to AI, and it'll just take time to figure out what those use cases are and integrate them. Integrating even present-day AI into the long tail of non-tech industries is probably going to take decades.

Also odd: no category for students cheating on homework? I notice that "editing services" is a big chunk of the "academia" category. Probably most of that traffic goes direct to chatgpt.com and bypasses OpenRouter entirely.

nextworddev•2mo ago

*State of non-enterprise, indie AI

All this data confirms that OpenRouter’s enterprise ambitions will fail. It’s a nice product for running Chinese models tho

IgorPartola•2mo ago

They have SOTA models from OpenAI and Anthropic and Google and you can access them at a 5.5% premium. What you get is the ability to seamlessly switch between them. And also when one is down you can instantly switch to another. Whether that is valuable to you or not is use case dependent. But it isn’t without value.

What it does have I think is a problem that TaskRabbit had: you can hire a house cleaner through TR but once you find a good one you can just work directly with them and save the middleman fee. So OR is great for experimenting with a ton of models to see what is the cheapest one that still performs the tasks you need but then you no longer need OR unless it is for reliability.

nextworddev•2mo ago

Use LiteLLM for model routing

meander_water•2mo ago

Overall really interesting read, but I'm having trouble processing this:

> OpenRouter performs internal categorization on a random sample comprising approximately 0.25% of all prompts

How can you arrive at any conclusion with such a small random sample size?

jfrbfbreudh•2mo ago

with enough samples

piskov•2mo ago

https://en.wikipedia.org/wiki/Central_limit_theorem

For example, even 300 really random people is enough to correctly assertain the distribution of population for some measurement (say, some personality feauture).

That’s the basis of all polls and what have you

gerdesj•2mo ago

I think you might be thrashing around 30 samples for a normal distribution and the Central Limit Theorem and accidentally added a zero!

(OK, on rereading, you did link to a WP article about CLT, so 30 it is!)

piskov•2mo ago

You’re absolutely right! (c)

300 — I had in memory as a safe bet in a case of some skewed stuff like log-normal, exponential, etc.

abdullahkhalids•2mo ago

Because the accuracy of an estimated quantity mostly depends on the size of the sample, not on the size of the population [1]. This does require assumptions like somewhat homogenous population and normal distributions etc. However, these assumptions often hold.

[1] https://stats.stackexchange.com/questions/166/how-do-you-dec...

hoppoli•2mo ago

Statistical significance comes mostly from N (number of samples) and the variance on the dimension you're trying to measure[1]. If the variance is high, you'll need higher N. If the variance is low, you'll need a lower N. The percentage of the population is not relevant (N = 1000 might be significant and it doesn't matter if it's 1% or 30% of the population)

[^1] This is a simplification. I should say that it depends on the standard error of your statistic, i.e, the thing you're trying to measure (If you're estimating the max of a population, that's going to require more samples than if you're estimating the mean). This standard error, in turn, will depend on the standard deviation of the dimension you're measuring. For example, if you're estimating the mean height, the relevant quantity is the standard deviation of height in the population.

paulirish•2mo ago

I worry that OpenRouter's Apps leaderboard incentivizes tools (e.g. Cline/Kilo) to burn through tokens to climb the ranks, meanwhile penalizing being context-efficient.

https://openrouter.ai/rankings#apps

trebligdivad•2mo ago

The 'Glass slipper' idea makes sense to me; people have a bunch of different ideas to try on AIs, and try it as new models come out, and once a model does it well they stick with it for a while.

greatgib•2mo ago

I like to see stats like that, but I find it very concerning that OpenRouter don't mind inspecting its user/customer data without shame.

Even if you pretend that the classifier respect anonymity, if I pay for the inference, I would expect that it would be a closed tube with my privacy respected. If at least it was for "safety" checks, I don't like that but I would almost understand, now it is for them to have "marketing data".

Imagine, and regarding the state of the world it might come soon, that you have whatsapp or telegram that inspect all the messages that you send to give reports like:

- 20% of our users speak about their health issues

- 30% of messages are about annoying coworkers

- 15% are messages comparing dick sizes

stingraycharles•2mo ago

They explicitly give you a discount if you opt in to allowing your data to be used for (anonymized) analytics. That’s pretty fair imho.

echelon•2mo ago

Cynical take: they could look at everyone and give a discount for optics.

I'd feel a lot better if "OpenRouter" were open source.

scirob•2mo ago

litellm is basically open source version https://www.litellm.ai although the openrouter being a hosted service is kinda the point. Unless the whole industry decides to this over e2ee you cant get any guarantees about an intermediary aggregator

stingraycharles•2mo ago

Nobody is forcing you to use it. It’s a service for convenience: you just pay one provider instead of having to create a bazillion accounts.

If you don’t like any middle men, just go to one of the providers directly.

heliumtera•2mo ago

>I would expect that it would be a closed tube with my privacy respected

Lol hahaha

otabdeveloper4•2mo ago

Yeah, welcome to "AI".

mosselman•2mo ago

Comments such as these are what allows developers to justify ignoring our privacy.

"Everyone knows in {{BUSINESS_TYPE}} there is no real privacy".

Be it fintech, AI or social media. You give them a free pass with being flippant about companies respecting privacy.

Being flippant about anyone being careless about our privacy is doing us as a society and injustice. We should demand privacy, not laugh at the notion of privacy.

homakov•2mo ago

The parent comment is exaclty right. "LOL" is best possible response.

>We should demand privacy, not laugh at the notion of privacy.

Recently got m3 ultra 512gb studio. LM Studio runs frontier models routinely. Going local is the ONLY way. That's all you can do. "Demanding privacy" is security theater. Act accordingly.

est•2mo ago

> inspecting its user/customer data without shame

Even better: they send all the data to GoogleTagClassifier, which means now Google had a copy of the sample as well

homakov•2mo ago

> Imagine

why imagine? The world already functions exactly like that. Talk on Tg like every chat is summarized every 24hrs and monthly (with cheap LLM and then with strong ones if signals found), and it reports to all kinds of interested intel agencies.

Same for openrouter. everything that leaves your device plaintext = public. Period. No hopes.

veunes•2mo ago

This is the inevitable evil of the man in the middle. OpenRouter by definition decrypts your traffic to route it to the provider (OpenAI, Anthropic). Technically, they can read everything The problem is that for the Enterprise segment, this is a showstopper. No bank or hospital will route data through an aggregator that openly states it classifies prompts via Google API (even sampled ones). This confirms that OpenRouter remains a tool for indie hackers and researchers, not for serious B2B

xanderatallah•2mo ago

Alex here from OpenRouter. We take privacy really seriously and how we do manage prompts and completions is described in our terms of service: https://openrouter.ai/terms.

We don’t retain any customer prompts or completions by default. As others here mentioned, you can opt-in for a 1% discount. Prompt classification is performed using a zero-data-retention and zero-training service, just like our own.

IgorPartola•2mo ago

Here is the thing: they made good enough open weight models available and affordable, then found that people used them more than before. I am not trying to diminish the value here but I don’t think this is the headline.

swyx•2mo ago

my highlights of this report: https://news.smol.ai/issues/25-12-04-openrouter

shubhamjain•2mo ago

I am a person who wants to maintain a distance from the AI-hype train, but seeing a chart like this [1], I can't help think that we are nowhere near the peak. The weekly token consumption keeps on rising, and it's already in trillions, and this ignores a lot of consumption happening directly through APIs.

Nvidia could keep delivering record-breaking numbers, and we may well see multiple companies hit six, seven, or even eight trillion dollars in market cap within a couple of years. While I am skeptical of claims like AI will make programming obsolete, but it’s clear that the adoption is still going like crazy and it's hard to anticipate when the plateau happens.

[1]: https://openrouter.ai/state-of-ai#open-vs_-closed-source-mod...

hattmall•2mo ago

When it's as cheap as 5 cents per million tokens I don't see "trillions" as being a particularly large number. Even at the most expensive level($120/1M for 5 Pro) 100 trillion tokens is only like $12 billion dollars.

mike_hearn•2mo ago

Problem is that most of that growth is in models being underpriced. We don't know what the demand curve looks like when tokens are priced to cover the full operating costs of the companies making them.

Also, growth seems to be linear, not exponential.

adamraudonis•2mo ago

Very cool study

armcat•2mo ago

These are fantastic insights! I work in legaltech space so something to keep in mind is that legal space is very sensitive to data storage and security (apart from this of course: https://alexschapiro.com/security/vulnerability/2025/12/02/f...). So models hosted in e.g. Azure, or on-prem deployments are more common. I have friends in health space and similar story there. Finance (banking especially) is the same. Hence why those categories look more or less constant over time, and have smallest contributions in this study.

adidoit•2mo ago

With studies like these it's important to keep in mind selection effects.

Most of the high volume enterprise use cases use their cloud providers (e.g., azure)

What we have here is mostly from smaller players. Good data but obviously a subset of the inference universe.

veunes•2mo ago

The 4x growth in prompt length is a fundamental shift. We've quickly moved from "Q&A" mode to "upload full context and analyze" mode.

This completely changes infrastructure requirements: KV-caching becomes a necessity, and prefill time becomes a critical metric, often more important than generation speed. That's exactly why models with cheap long context (Gemini, DeepSeek) are winning the race against "smarter" but expensive models. Inference economics are now dictated by context length

Start all of your commands with a comma

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Jeffrey Snover: "Welcome to the Room"

How we made geo joins 400× faster with H3 indexes

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: I spent 4 years building a UI design tool with only the features I use

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Vocal Guide – belt sing without killing yourself

Microsoft open-sources LiteBox, a security-focused library OS

Where did all the starships go?

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

Was Benoit Mandelbrot a hedgehog or a fox?

PC Floppy Copy Protection: Vault Prolok

Dark Alley Mathematics

How to effectively write quality code with AI

Delimited Continuations vs. Lwt for Threads

Female Asian Elephant Calf Born at the Smithsonian National Zoo

I now assume that all ads on Apple news are scams

Introducing the Developer Knowledge API and MCP Server

Understanding Neural Network, Visually

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Why I Joined OpenAI

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Learning from context is harder than we thought

Start all of your commands with a comma

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Jeffrey Snover: "Welcome to the Room"

How we made geo joins 400× faster with H3 indexes

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: I spent 4 years building a UI design tool with only the features I use

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Vocal Guide – belt sing without killing yourself

Microsoft open-sources LiteBox, a security-focused library OS

Where did all the starships go?

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

Was Benoit Mandelbrot a hedgehog or a fox?

PC Floppy Copy Protection: Vault Prolok

Dark Alley Mathematics

How to effectively write quality code with AI

Delimited Continuations vs. Lwt for Threads

Female Asian Elephant Calf Born at the Smithsonian National Zoo

I now assume that all ads on Apple news are scams

Introducing the Developer Knowledge API and MCP Server

Understanding Neural Network, Visually

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Why I Joined OpenAI

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Learning from context is harder than we thought

State of AI: An Empirical 100T Token Study with OpenRouter

Comments