Ask HN: What makes it so hard to keep LLMs online?

2•realberkeaslan•1h ago

It feels like every few days one of the big AI services is down, degraded, or just slow. I don't mean this as a complaint. I'm just genuinely curious. These are well-funded companies with smart people. What is it about running these models that makes reliability so elusive? Is it just demand nobody predicted, or is there something fundamentally different about serving AI vs. a normal web app?

Comments

zipy124•1h ago

Likely one large contributor is that for a normal service, if it's down it's as simple as re-routing to another service, and there is basically an unlimited amount of CPU servers around the world to spin up on demand. GPU servers are much harder to spin up on demand, as supply is so constrained.

Another factor is just it's a new field and move fast and break things is still the go to as competition is high, and the stakes are incredibly high monetary wise.

A pessimistic, but perhaps true theory is also just vibe-coding/slop is reducing their reliability.

A counter point is that regular services like github seem to go down almost as frequently.

angarrido•34m ago

must people think it’s just GPU cost. In practice it’s coordination: model latency variance + queueing + retries under load. You don’t scale linearly, you get cascading slowdowns.

If your random seed is 42 I set your computer on fire (2025)

The Three Realities of AI

Why Investing in Wind and Solar to Avoid Gas Shocks Hasn't Added Up for Some

Meta creating AI version of Mark Zuckerberg so staff can talk to the boss

Review of Direct Air Capture Systems Powered by Nuclear Energy

God Tier Party Lentils

A Python Interpreter Written in Python

Hacking My Kobo with KOReader

Web Haptics on Mobile

NetBSD/MacPPC 9.4 Installation on a QEMU Emulated PowerPC Macintosh

B-trees and database indexes (2024)

BookingCom Data Breach: Unauthorized Access to Booking Information

Kraken Security Update

When does generative AI qualify for fair use? (2024) By previous OpenAI employee

AI agent remembers your secrets

Disputed Boundaries Policy

Rust Program Management Board

Apple Ramps Up MacBook Neo Production to 10M Units Amid Strong Demand

Palantir Stock Continues to Fall. Not Even the President Can Erase the Losses

Show HN: Access X, Reddit, Threads and all social media data from a single API

A Picture is Worth a Thousand Tokens

Dual national Londoner stranded in Spain by new border rule

Problems Before the Real Problem: The First Lessons of Apollo 13

Apple Reportedly Testing AI Glasses in Several Frame Styles

How to Stop Cops from Using Wi-Fi to "See Through the Walls" of Your Home [video]

Show HN: Curation: Share Podcast Recommendation

OpenAI's latest internal memo about beating the competition

Mount GitHub repositories as a virtual read-only macOS filesystem

I Rode in a Waymo with a Litigator: Here's What I Learned

Show HN: Is Claude still thinking? How are you wasting life?