Seen the same LLM prompt break invariants weeks later in prod?

2•ritwikkar•3w ago

I’m asking specifically about LLM calls embedded inside real production workflows, not demos, side projects, or exploratory prompt work.

Think backend pipelines like: step 1 → LLM → step 2 → LLM → step 3 where users depend on the output and nothing technically “crashes.”

We’ve seen a recurring pattern: - Same input, same prompt, same model - Works reliably for weeks - Then a constraint is ignored, or a later step contradicts an earlier one - Retries don’t reliably fix it - Logs don’t explain what changed

The hardest part isn’t bad output, it’s not being able to explain failures to PMs or stakeholders when nothing obviously broke.

Curious how others operating LLM-backed workflows in production are diagnosing or containing this kind of behavior over time.

(Not looking for prompt advice or eval frameworks. Interested in operational experiences.)

Comments

chrisjj•3w ago

> The hardest part isn’t bad output, it’s not being able to explain failures to PMs or stakeholders when nothing obviously broke.

Try: The known unreliability of stochastic LLM tech caused obviously predictable failure of output depended upon by the user.

Perhaps present the analogy of a random number generator feeding the calculation of a company's statutory financial accounts.

kinkyusa•3w ago

https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

"Compiled" Specs

The Next Big Language (2007) by Steve Yegge

Open-Weight Models Are Getting Serious: GLM 4.7 vs. MiniMax M2.1

Using AI for Code Reviews: What Works, What Doesn't, and Why

Show HN: Solnix – an early-stage experimental programming language

DoNotNotify is now Open Source

The British Empire's Brothels

What rare disease AI teaches us about longitudinal health

The Brand Savior Complex and the New Age of Self Censorship

Show HN: A Prompting Framework for Non-Vibe-Coders

Kilroy is a local-first "software factory" CLI

Mathscapes – Jan 2026 [pdf]

80386 Barrel Shifter

Training Foundation Models Directly on Human Brain Data

Web Speech API on HN Threads

ArtisanForge: Learn Laravel through a gamified RPG adventure – 100% free

Your phone edits all your photos with AI – is it changing your view of reality?

DStack, a small Bash tool for managing Docker Compose projects

Hop – Fast SSH connection manager with TUI dashboard

Turning books to courses using AI

Top #1 AI Video Agent: Free All in One AI Video and Image Agent by Vidzoo AI

Ask HN: How would you design an LLM-unfriendly language?

Show HN: MuxPod – A mobile tmux client for monitoring AI agents on the go

March for Billionaires

Turn Claude Code/OpenClaw into Your Local Lovart – AI Design MCP Server

An Nginx Engineer Took over AI's Benchmark Tool

Use fn-keys as fn-keys for chosen apps in OS X

Sir/SIEN: A communication protocol for production outages

Show HN: OpenCode for Meetings

The chaos in the US is affecting open source software and its developers