$0.28/M Input ($0.028/M cache hit) > $0.42/M Output
(The inference costs are cheaper for them now as context grows because of the Sparse attention mechanism)
Output: $1.68 per million tokens.
I think this is just as important to distribution of AI as model intelligence is.
AFAIK there are no fundamental "laws" that prevent price from continuing to fall, at least correlated with Moore's law (or whatever the current AI/Nvidia chip development cycle is called right now)- each new generation of hardware is significantly faster/cheaper than the next- so will we see a ChatGPT-5 model at half the price in a year? (yes I know that thinking models cost more, but just on a per-token basis)
Price deflation is not tied to Moore's right now because much of the performance gains are from model optimization, high bandwidth memory supply chains, and electrical capacity build out, not FLOP density.
Part of me is optimistic that when the AI bubble bursts the excess data center capacity is going to be another force driving the cost of inference down.
Yeppers, when that bubble burst - that's hilarious. This is the kinda stuff grandkids won't believe someday.
Performance gained from model improvements has outpaced performance gained from hardware improvements for decades.
I believe you but that's not exactly an unbiased source of information.
This is usually not the case for paid models -- is Openrouter just marking this model incorrectly or do Deepseek actually train on submitted data?
I guess I'll wait for a 3rd party provider on Openrouter that doesn't log DS 3.2.
Is it just the API client bindings that are open and the core routing service is closed!
If they lead the market, they'll extract value in lots of ways that an open company could at least be compelled not to. Plus there won't be competition.
They're probably selling your data to LLM companies and you don't even see what they're doing.
Without competition, they'll raise their rates.
If they were open, you could potentially run the offering on-prem. You could bolt on new providers or use it internally for your own routing.
Lots of reasons.
I think it's just called OpenRouter because the founder previously started OpenSea (an NFT marketplace), and also probably to sound a bit similar to OpenAI. It's like companies calling their products "natural" or "organic" or "artisan" when they can get away with it, just a marketing strategy of using words that conjure up vaguely positive connotations in your mind.
It's a frictionless marketplace connecting inference providers and customers, creating a more competitive market. Or a more open market if you play a bit fast and loose with terminology
Input and output costs are peanuts compared to the order of magnitude(or more) amount of tokens that hit the cache.
At that point you might as well use GPT-5. It will be the same price or cheaper, and more capable.
deepseek API supports caching, stop manufacturing problems where there is none.
Openrouter says they might use your data for training.
If you read my post carefully, you will realize that I did not make any contradictory statements.
My wife is Chinese.
DeepSeek supports caching and cache hits are a tenth of the cost.
$0.028/M for cache hit
$0.28/M for cache miss
$0.42/M for output
If they are okay for you, then sure go ahead. Enjoy the caching.
What other provider is going to support it?
Why?
They trained a thing to learn mimicking the full attention distribution but only filtering the top-k (k=2048) most important attention tokens so that when the context window increases, the compute does not go up linearly but constantly for the attention->[query,key] process (it does grow up linearly in the graph because you still need to roughly scan the entire context window (which an "indexer" will do), but just very roughly here in order to speed up things, which is O(L) here).
terespuwash•4mo ago