frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: What makes it so hard to keep LLMs online?

2•realberkeaslan•3h ago
It feels like every few days one of the big AI services is down, degraded, or just slow. I don't mean this as a complaint. I'm just genuinely curious. These are well-funded companies with smart people. What is it about running these models that makes reliability so elusive? Is it just demand nobody predicted, or is there something fundamentally different about serving AI vs. a normal web app?

Comments

zipy124•2h ago
Likely one large contributor is that for a normal service, if it's down it's as simple as re-routing to another service, and there is basically an unlimited amount of CPU servers around the world to spin up on demand. GPU servers are much harder to spin up on demand, as supply is so constrained.

Another factor is just it's a new field and move fast and break things is still the go to as competition is high, and the stakes are incredibly high monetary wise.

A pessimistic, but perhaps true theory is also just vibe-coding/slop is reducing their reliability.

A counter point is that regular services like github seem to go down almost as frequently.

arc_light•2h ago
A piece that often gets overlooked: unlike a web app where you can cache aggressively and serve millions of users from relatively few servers, LLM inference is stateless in a weird way — each request is expensive and can't really be batched the way traditional workloads can. A spike in traffic doesn't just slow you down linearly, it creates queuing effects that cascade fast.

There's also the memory side. A large model has to live entirely in GPU VRAM to run efficiently. You can't just "add more RAM" on the fly the way you can with CPU workloads. Scaling means acquiring, provisioning, and loading entirely new physical machines — which takes minutes to hours, not seconds.

So you end up with a system that's simultaneously very expensive per-request, very hard to scale horizontally in real time, and very sensitive to traffic spikes. That's a reliability engineer's nightmare even before you factor in the supply constraints the sibling comment mentioned.

andyjohnson0•24m ago
Not sure why this was apparently flagged to death. Vouched.
angarrido•1h ago
must people think it’s just GPU cost. In practice it’s coordination: model latency variance + queueing + retries under load. You don’t scale linearly, you get cascading slowdowns.

Ask HN: What Are You Working On? (April 2026)

299•david927•1d ago•1009 comments

Tell HN: Docker pull fails in Spain due to football Cloudflare block

1085•littlecranky67•1d ago•397 comments

Ask HN: What makes it so hard to keep LLMs online?

2•realberkeaslan•3h ago•4 comments

Tell HN: Another Monday, Another Claude Outage

5•ericol•3h ago•1 comments

Ask HN: What's the best AI model for system design nowadays?

2•jcremona•3h ago•1 comments

Tell HN: OpenAI silently removed Study Mode from ChatGPT

171•smokel•1d ago•74 comments

How do you validate your GTM Efforts?

2•pranaywankhede•8h ago•2 comments

Ask HN: Are you negatively affected by the recent economic stagnation?

4•adinhitlore•8h ago•2 comments

Ask HN: What are you building that's not AI related?

148•meander_water•4d ago•206 comments

Tell HN: Reddit now demands to know why you won't use their app

17•josephcsible•16h ago•23 comments

Ask HN: Is Codex really on Par with Claude Code?

5•shivang2607•8h ago•11 comments

Ask HN: Do you trust AI agents with API keys / private keys?

15•devendra116•1d ago•28 comments

Ask HN: Anyone using Nostr as a lightweight back end/DB for rapid prototyping?

6•wasimsk•1d ago•0 comments

Ask HN: Hiring in the age of AI-assisted coding: what works?

26•nitramm•2d ago•17 comments

Is the pitch deck culture making founders worse at building businesses?

17•chinhqtran•2d ago•6 comments

Ask HN: Best books on building a programming language

17•ezzato•2d ago•9 comments

Ask HN: What's your experience with PoW captchas against form spam?

5•pentacent_hq•1d ago•9 comments

Antigravity Is Down

2•samroar04•6h ago•3 comments

Ask HN: What should I do with my app? 130 downloads 3 real subscribers

3•oyaa52•1d ago•13 comments

Any Open Source projects in need of documentation writer?

21•tree666•3d ago•13 comments

Ask HN: Former grok-code-fast-1 users, what coding model are you using now?

2•whycombinetor•1d ago•3 comments

Ask HN: Why Databases Instead of Filesystem?

14•uticus•3d ago•20 comments

Ask HN: Agentic Permutation of Testing Paths In A System

4•davidajackson•1d ago•0 comments

Ask HN: Should AI credits be refunded on mistakes?

20•ed_elliott_asc•5d ago•21 comments

Tor Browser on Android leaks IP in desktop mode

13•shchess•2d ago•2 comments

Ask HN: Has anyone reconsidered Antivirus software after recent security news?

6•pants2•1d ago•6 comments

Do founders' political views affect how you see a product?

4•rishikeshs•2d ago•5 comments

I collected startup ideas. It changed how I think about ideas completely

10•vibecoder21•2d ago•11 comments

Ask HN: How do you manage your digital legacy for after you die?

15•orbanlevi•5d ago•16 comments

Ask HN: Local-first meetings recorder and transcriber?

7•dandaka•4d ago•1 comments