A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr. With 188 million input tokens/hr and 80 million output tokens/hr, that comes out to around $2/million input tokens, and $4.70/million output tokens.
This is actually a lot more than Deepseek r1's rates of $0.10-$0.60/million input and $2/million output, but I'm sure major providers are not paying AWS p5 on-demand pricing.
Inference is more profitable than I thought.
34679•1h ago
Is that just the cost of electricity, or does it include the cost of the GPUs spread out over their predicted lifetime?
dragonslayer56•1h ago
Maybe the cost of renting?
34679•52m ago
randomjoe2•22m ago
monsieurbanana•11m ago
mwcz•10m ago
That's silly, but the idea that "local" is not the opposite of remote is even sillier.
ffsm8•5m ago
Lots of people were advocating for running their k8s on bare metal servers to maximize the performance of their containers
Now wherever that's applied to your conversation... I've no clue, too little context ( 。 ŏ ﹏ ŏ )
DSingularity•19m ago
ollybee•23m ago