From specification to stress test: a weekend with Claude

https://www.juxt.pro/blog/from-specification-to-stress-test/

27•henrygarner•1h ago

Comments

henrygarner•1h ago

Over a weekend, between board games and time with my kids, Claude and I built a distributed system with Byzantine fault tolerance, strong consistency and crash recovery under arbitrary failures. I described the behaviour I wanted in Allium, worked through the bugs conversationally and didn't write a line of implementation code.

SPICLK2•42m ago

I don't see why you need to bring your kids into this, and as a parent any suggestion of being distracted by tech during time with the kids raises my suspicions.

We have a strict no laptops and no phones rule when the kids are around (unless we're specifically doing something with them using those tools - looking at the weather forecast, or looking up some information).

"I can prompt AI while playing with the kids" is not a future I want.

altmanaltman•44m ago

This blog post is a sophisticated piece of content marketing for a company called JUXT and their proprietary tool, "Allium." While the technical achievement is plausible, the framing is heavily distorted to sell a product.

Here is the breakdown of the flaws and the "BS" in the narrative.

1. The "I Didn't Write Code" Lie The author claims, "I didn't write a line of implementation code." The Flaw: He wrote 3,000 lines of "Allium behavioural specification." The BS: Writing 3,000 lines of a formal specification language is coding. It’s just coding in a proprietary, high-level language instead of Kotlin.

The Ratio is Terrible: The post admits the output was ~5,500 lines of Kotlin. That means for every 1 line of spec, he got roughly 1.8 lines of code.

Why this matters: True "low-code" or "no-code" leverage is usually 1:10 or 1:100. If you have to write 3,000 lines of strict logic to get a 5,000-line program, you haven't saved much effort—you've just swapped languages.

2. The "Weekend Project" Myth The post frames this as a casual project done "between board games and time with my kids." The Flaw: This timeline ignores the massive "pre-computation" done by the human. The BS: To write 3,000 lines of coherent, bug-free specifications for a Byzantine Fault Tolerant (BFT) system, you need to have the entire architecture fully resolved in your head before you start typing. The author is an expert (CTO level) who likely spent weeks or years thinking about these problems. The "48 hours" only counts the typing time, not the engineering time.

3. The "Byzantine Fault Tolerance" (BFT) Bait-and-Switch The headline claims "Byzantine fault tolerance," which implies a system that continues to operate correctly even if nodes lie or act maliciously (extremely hard to build). The Flaw: A "Resolved Question" block in the text admits: "The system's goal is Byzantine fault detection, not classical BFT consensus." The BS: Real BFT (like PBFT or Raft with signatures) is mathematically rigorous and keeps the system running. "Fault Detection" just means "if the two copies don't match, stop." That is significantly easier to build. Calling it "BFT" in the intro is a massive overstatement of the system's resilience.

4. The "Maintenance Nightmare" (The Vendor Lock-in Trap) The post glosses over how this system is maintained. The Flaw: You now have 5,500 lines of Kotlin that no human wrote. The BS: This is the "Model Driven Architecture" (MDA) trap from the early 2000s.

Scenario: You find a bug in the Kotlin code.

Option A: You fix the Kotlin. Result: Your code is now out of sync with the Spec. You can never regenerate from Spec again without losing your fix.

Option B: You fix the Spec. Result: You hope the AI generates the exact Kotlin fix you need without breaking 10 other things.

The Reality: You are now 100% dependent on the "Allium" tool and Claude. If you stop paying for Allium, you have a pile of unmaintainable machine-generated code.

5. The Performance "Turning Point" The dramatic story about 10,000 Requests Per Second (RPS) has a hole in it. The Flaw: The "bottleneck" wasn't the code; it was a Docker proxy setting (gvproxy). The BS: This is a standard "gotcha" for anyone using Docker on Mac. Framing this as a triumph of AI debugging is a stretch—any senior engineer would check network topology when seeing high latency but low CPU usage. 10k RPS is also not "ambitious" for a modern distributed system; a single well-optimized Node.js or Go process can handle that easily.

antonly•20m ago

Hello, could you please put your over-sensationalized, overly-long, AI-generated comments somewhere else? Thank you.

Kindly, the HN Community.

altmanaltman•7m ago

Could you please stop being the messiah of the HN community and talk to the subject matter instead of making personal snide remarks? Thank you.

bandrami•16m ago

> Your code is now out of sync with the Spec

Is there even a sync to be had? The same prompt to the same LLM at different times will yield different artifacts, even if you were to save and re-use the seed.

amarble•25m ago

As a counterpoint, I also tried writing something with Claude last weekend: a Google docs clone[1]. I spent $170 on Anthropic API credits, and got something that did mostly what I asked but was basically useless. It seems that for simple interfaces for which there is an exact specification, like the recent compiler and web browser examples, it's possible to write bigger projects that "work" as a demo although not in a way they'd be viable alternatives. For anything that requires taste and judgment, we've still got a long way to go. There are lots of great demos out there but few if any real examples of vibe coded (or whatever you want to call it) software standing alone as an alternative to project people wrote.

[1] https://www.marble.onl/posts/this_cost_170.html

NitpickLawyer•4m ago

> The result is OK. It has all the features I asked for, and includes document sharing, collaborative editing in real time, support for fonts and line spacing, etc. etc. I could not have paid a developer $170 and got this. The problem, of course, is that, while abstractly impressive, this is completely useless

Well, what would you expect from a few hours of running in a loop with these constraints?

> This project exists to build a document editor from the ground up. Violating these constraints defeats the entire purpose.

> FORBIDDEN dependencies (do NOT install or use these):

> Rich text editor frameworks: ProseMirror, Slate, Quill, TipTap, Draft.js, CKEditor, TinyMCE, Lexical, or any similar library

> CRDT/OT libraries: Yjs, Automerge, ShareDB, OT.js, or any similar library

> Full CSS frameworks: Bootstrap, Tailwind, Material UI (small utility libs for specific needs are OK)

> ORMs: Prisma, TypeORM, Sequelize (use raw SQL or a thin query builder)

I can't help but wonder what you thought you would achieve, and how getting "mostly what you asked for" is still disappointing to you.

> there is no taste being applied.

There are 0 lines in AGENT_PROMPT.md about "taste". You have instructed something/someone on how to build more than what to build.

Your goals are (from a quick skim):

- The goal of this project is ultimately to generate a working alternative to Google Docs with the same functionality.

- You are an autonomous software engineer building AltDocs, a from-scratch alternative to Google Docs.

I see a FEATURES.md file, but not clear if this is from you or expanded by the model. It seems pretty slim.

All in all, I don't get the "disappointment". It seems, from your blog post, that the "model" did most of the things you asked for. The disappointment might come from what you asked for, more than from the "model" being bad... To paraphrase a line from a sitcom: "Damn, Andrew, I can't control the weather!" :)

asksomeoneelse•11m ago

Is there a link to the specification and the resulting generated code ? I skimmed through the article and the author's github profile, but couldn't find anything related.

Seems like a serious oversight if this is your selling point.

Warcraft III Peon Voice Notifications for Claude Code

Discord/Twitch/Snapchat age verification bypass

The missing digit of Stela C

Using an engineering notebook

“Nothing” is the secret to structuring your work

GLM-5: Targeting complex systems engineering and long-horizon agentic tasks

Fluorite – A console-grade game engine fully integrated with Flutter

Text classification with Python 3.14's ZSTD module

Show HN: A free online British accent generator for instant voice conversion

Ireland rolls out basic income scheme for artists

HeyWhatsThat

Kanchipuram Saris and Thinking Machines

RISC-V Vector Primer

From specification to stress test: a weekend with Claude

NetNewsWire Turns 23

D Programming Language

How to make a living as an artist

Reports of Telnet's death have been greatly exaggerated

Lance table format explained with simple animations

Clay Christensen's Milkshake Marketing (2011)

The other Markov's inequality

WiFi could become an invisible mass surveillance system

Show HN: Huesnatch – 6 free color tools for designers, no login, no uploads

GLM-OCR – A multimodal OCR model for complex document understanding

Claude Code is being dumbed down?

Apple's latest attempt to launch the new Siri runs into snags

Deobfuscation and Analysis of Ring-1.io

Show HN: CodeRLM – Tree-sitter-backed code indexing for LLM agents

Microwave Oven Failure: Spontaneously turned on by its LED display (2024)

From 34% to 96%: The Porting Initiative Delivers – Hologram v0.7.0

From specification to stress test: a weekend with Claude

Comments

Warcraft III Peon Voice Notifications for Claude Code

Discord/Twitch/Snapchat age verification bypass

The missing digit of Stela C

Using an engineering notebook

“Nothing” is the secret to structuring your work

GLM-5: Targeting complex systems engineering and long-horizon agentic tasks

Fluorite – A console-grade game engine fully integrated with Flutter

Text classification with Python 3.14's ZSTD module

Show HN: A free online British accent generator for instant voice conversion

Ireland rolls out basic income scheme for artists

HeyWhatsThat

Kanchipuram Saris and Thinking Machines

RISC-V Vector Primer

From specification to stress test: a weekend with Claude

NetNewsWire Turns 23

D Programming Language

How to make a living as an artist

Reports of Telnet's death have been greatly exaggerated

Lance table format explained with simple animations

Clay Christensen's Milkshake Marketing (2011)

The other Markov's inequality

WiFi could become an invisible mass surveillance system

Show HN: Huesnatch – 6 free color tools for designers, no login, no uploads

GLM-OCR – A multimodal OCR model for complex document understanding

Claude Code is being dumbed down?

Apple's latest attempt to launch the new Siri runs into snags

Deobfuscation and Analysis of Ring-1.io

Show HN: CodeRLM – Tree-sitter-backed code indexing for LLM agents

Microwave Oven Failure: Spontaneously turned on by its LED display (2024)

From 34% to 96%: The Porting Initiative Delivers – Hologram v0.7.0