Ask HN: Why is LLM training still GPU-hungry despite DeepSeek?
3•takinola•7h ago
When DeepSeek released R-1 everyone thought that signaled the end of the GPU-intensive LLM training approach. It does not appear to have worked out that way as GPU demand continues to grow unabated. What happened? Is the DeepSeek training method unreproducible or impractical in some way?
Comments
cratermoon•7h ago
The DeepSeek method requires spending money on very good programmers and giving them the tools and time to build out optimizations.
The hype-driving LLM cycles and companies with multi-billion dollar valuations prioritize time-to-market and throw money at more and bigger GPUs to solve performance bottlenecks.
It's "impractical" if the goal is to make as much money as possible before the bubble pops.
cratermoon•7h ago
It's "impractical" if the goal is to make as much money as possible before the bubble pops.