From specification to stress test: a weekend with Claude

https://www.juxt.pro/blog/from-specification-to-stress-test/

26•henrygarner•1h ago

Comments

henrygarner•1h ago

Over a weekend, between board games and time with my kids, Claude and I built a distributed system with Byzantine fault tolerance, strong consistency and crash recovery under arbitrary failures. I described the behaviour I wanted in Allium, worked through the bugs conversationally and didn't write a line of implementation code.

SPICLK2•27m ago

I don't see why you need to bring your kids into this, and as a parent any suggestion of being distracted by tech during time with the kids raises my suspicions.

We have a strict no laptops and no phones rule when the kids are around (unless we're specifically doing something with them using those tools - looking at the weather forecast, or looking up some information).

"I can prompt AI while playing with the kids" is not a future I want.

altmanaltman•28m ago

This blog post is a sophisticated piece of content marketing for a company called JUXT and their proprietary tool, "Allium." While the technical achievement is plausible, the framing is heavily distorted to sell a product.

Here is the breakdown of the flaws and the "BS" in the narrative.

1. The "I Didn't Write Code" Lie The author claims, "I didn't write a line of implementation code." The Flaw: He wrote 3,000 lines of "Allium behavioural specification." The BS: Writing 3,000 lines of a formal specification language is coding. It’s just coding in a proprietary, high-level language instead of Kotlin.

The Ratio is Terrible: The post admits the output was ~5,500 lines of Kotlin. That means for every 1 line of spec, he got roughly 1.8 lines of code.

Why this matters: True "low-code" or "no-code" leverage is usually 1:10 or 1:100. If you have to write 3,000 lines of strict logic to get a 5,000-line program, you haven't saved much effort—you've just swapped languages.

2. The "Weekend Project" Myth The post frames this as a casual project done "between board games and time with my kids." The Flaw: This timeline ignores the massive "pre-computation" done by the human. The BS: To write 3,000 lines of coherent, bug-free specifications for a Byzantine Fault Tolerant (BFT) system, you need to have the entire architecture fully resolved in your head before you start typing. The author is an expert (CTO level) who likely spent weeks or years thinking about these problems. The "48 hours" only counts the typing time, not the engineering time.

3. The "Byzantine Fault Tolerance" (BFT) Bait-and-Switch The headline claims "Byzantine fault tolerance," which implies a system that continues to operate correctly even if nodes lie or act maliciously (extremely hard to build). The Flaw: A "Resolved Question" block in the text admits: "The system's goal is Byzantine fault detection, not classical BFT consensus." The BS: Real BFT (like PBFT or Raft with signatures) is mathematically rigorous and keeps the system running. "Fault Detection" just means "if the two copies don't match, stop." That is significantly easier to build. Calling it "BFT" in the intro is a massive overstatement of the system's resilience.

4. The "Maintenance Nightmare" (The Vendor Lock-in Trap) The post glosses over how this system is maintained. The Flaw: You now have 5,500 lines of Kotlin that no human wrote. The BS: This is the "Model Driven Architecture" (MDA) trap from the early 2000s.

Scenario: You find a bug in the Kotlin code.

Option A: You fix the Kotlin. Result: Your code is now out of sync with the Spec. You can never regenerate from Spec again without losing your fix.

Option B: You fix the Spec. Result: You hope the AI generates the exact Kotlin fix you need without breaking 10 other things.

The Reality: You are now 100% dependent on the "Allium" tool and Claude. If you stop paying for Allium, you have a pile of unmaintainable machine-generated code.

5. The Performance "Turning Point" The dramatic story about 10,000 Requests Per Second (RPS) has a hole in it. The Flaw: The "bottleneck" wasn't the code; it was a Docker proxy setting (gvproxy). The BS: This is a standard "gotcha" for anyone using Docker on Mac. Framing this as a triumph of AI debugging is a stretch—any senior engineer would check network topology when seeing high latency but low CPU usage. 10k RPS is also not "ambitious" for a modern distributed system; a single well-optimized Node.js or Go process can handle that easily.

antonly•5m ago

Hello, could you please put your over-sensationalized, overly-long, AI-generated comments somewhere else? Thank you.

Kindly, the HN Community.

bandrami•1m ago

> Your code is now out of sync with the Spec

Is there even a sync to be had? The same prompt to the same LLM at different times will yield different artifacts, even if you were to save and re-use the seed.

amarble•10m ago

As a counterpoint, I also tried writing something with Claude last weekend: a Google docs clone[1]. I spent $170 on Anthropic API credits, and got something that did mostly what I asked but was basically useless. It seems that for simple interfaces for which there is an exact specification, like the recent compiler and web browser examples, it's possible to write bigger projects that "work" as a demo although not in a way they'd be viable alternatives. For anything that requires taste and judgment, we've still got a long way to go. There are lots of great demos out there but few if any real examples of vibe coded (or whatever you want to call it) software standing alone as an alternative to project people wrote.

[1] https://www.marble.onl/posts/this_cost_170.html

Startup Investment Tracker for Europe

Working Yourself Out of a Job

Show HN: A CODEOWNERS management cli in Rust

Show HN: Geo Racers – Race from London to Tokyo on a single bus pass

ANSI Escape Code Injection in OpenAI's Codex CLI

Turn Security Threats: A Hacker's View

Pseudonyms Used by Donald Trump

Show HN: Vibe Deploy... Deploy full-stack apps to your own servers via AI

Only use agents for tasks you know how to do

Scripting on the JVM with Java, Scala, and Kotlin

DeepSeek with 1M context window is loaded for testing

Show HN: SuperLocalMemory– Local-first AI memory for Claude, Cursor and 16+tools

AI researchers are sounding the alarm on their way out the door

Show HN: Threadlink – Turn long AI chats into portable context cards

Show HN: I built an open-source About A macOS style photo manager for Windows

Subscription-Based API Throttling Without Client API Keys

Show HN: I Made a Math Crossword

Tomorrow, and tomorrow – Ian McKellen analyzes Macbeth speech (1979) [video]

Show HN: Graphthulhu – Knowledge Graph MCP Server for Logseq and Obsidian

Show HN: I made a game where you factor RSA

What to do before thinking hard

ComponentPro Mail library – This website used to sell stolen software

Why am I unable to submit posts containing web addresses?

Ask HN: How do you prevent sensitive data leaks in screen-recorded demos?

Majutsu An Emacs interface for Jujutsu / jj, like Magit

Two Autonomous Claudes, Full System Access, No Instructions. An Experiment

Show HN: Kreuzberg Comparative Benchmarks

Data Influence

xAI's Moonshot Meeting:Billion-Image Floods, and a Lunar AI Factory (No, Really)

Recreated the Nier Automata UI in React