Verified Spec-Driven Development (VSDD)

https://gist.github.com/dollspace-gay/d8d3bc3ecf4188df049d7a4726bb2a00

59•todsacerdoti•2h ago

Comments

politician•1h ago

This is a decent approach. My concern with TDD is that writing tests necessarily implies designing an API upon which those tests operate. Here, the agent is instructed to "not write code, write tests", and yet, in doing so it defines an API. This will cause the AI to hallucinate the API. Layering in yet more tests on top of this will cause that API to deform in strange ways that pass tests but that the adversary will not be able to cope with because it runs too late in the VSDD process.

I've seen this exact process play out in my own work. The AI generates code and tests that pass with high code coverage and honors invariants set by spec. I look at the code and find a rats nest / ball of mud that will cost 10x more tokens to enhance should I ever need to add a feature.

So, I think you're on to something, but I think the process might be discounting extensibility and resilience under change.

NeutralForest•1h ago

> My concern with TDD is that writing tests necessarily implies designing an API upon which those tests operate

It really forces you to do outside-in testing; I usually describe the kind of API I want when chatting with the agents. For example, the CLI options, the routes that might be useful for an API, etc.

> I look at the code and find a rats nest / ball of mud that will cost 10x more tokens to enhance should I ever need to add a feature.

Agreed, I don't know if there are good forcing functions to avoid complexity. The providers have a huge incentive to have you waste your tokens (for example when it re-outputs the complete file when you ask for a tiny change).

bonesss•1h ago

“Test Driven Design” is another way to frame it.

DaylitMagic•1h ago

I see what you're saying. Modularity and interfaces are really important between the different aspects of what's being developed. And it is worth putting time into the question "if another will use this, what would they potentially use it for, and why?". It doesn't mean that it needs to be built now - but considering that and ensuring that the planned code executes against that is a good strategy.

sjbr•1h ago

Nice. It can work with something like https://github.com/github/spec-kit ?

beders•11m ago

My employer is trying to convince us to embrace spec-kit. But we are a Clojure shop: we iterate fast and produce results. We don't sit around and write specs and then hope working code plops out.

_pdp_•1h ago

Everything in this post stems from the assumption that you already know what you're doing, which is probably true for things you've built before. But I hope we can agree that you can't spec out something you have no clue how to build, let alone write the tests before you've even explored the boundaries of the problem space. That's completely unreasonable.

My second point is that this approach is fundamentally wrong for AI-first development. If the cost of writing code is approaching zero, there's no point investing resources to perfect a system in one shot. What matters more is how fast you can explore the edges. You can now spin up five agents to implement five different versions of the thing you're building and simply pick the best one.

In our shop, we have hundreds of agents working on various problems at any given time. Most of the code gets discarded. What we accept to merge are the good parts.

politician•1h ago

"Most of the code gets discarded." If you don't mind sharing, what's your signal-to-token ratio?

kvdveer•35m ago

How do you propose we measure signal? Lines of code is renowned for being a very bad measure of anything, and I really can't come up with anything better.

DaylitMagic•1h ago

If you don't mind the question with regard to your second point, couldn't what you've done in your shop be also used here? There's no reason why 'try to develop it five different ways and pick the best parts out of each' is incompatible with the 'VSDD' concept; seems like it could be included?

tikhonj•1h ago

> you can't spec out something you have no clue how to build

Ideally—and at least somewhat in practice—a specification language is as much a tool for design as it is for correctness. Writing the specification lets you explore the design space of your problem quickly with feedback from the specification language itself, even before you get to implementing anything. A high-level spec lets you pin down which properties of the system actually matter, automatically finds an inconsistencies and forces you to resolve them explicitly. (This is especially important for using AI because an AI model will silently resolve inconsistencies in ways that don't always make sense but are also easy to miss!)

Then, when you do start implementing the system and inevitably find issues you missed, the specification language gives you a clear place to update your design to match your understanding. You get a concrete artifact that captures your understanding of the problem and the solution, and you can use that to keep the overall complexity of the system from getting beyond practical human comprehension.

A key insight is that formal specification absolutely does not have to be a totally up-front tool. If anything, it's a tool that makes iterating on the design of the system easier.

Traditionally, formal specification have been hard to use as design tools partly because of incidental complexity in the spec systems themselves, but mostly because of the overhead needed to not only implement the spec but also maintain a connection between the spec and the implementation. The tools that have been practical outside of specific niches are the ones that solve this connection problem. Type systems are a lightweight sort of formal verification, and the reason they took off more than other approaches is that typechecking automatically maintains the connection between the types and the rest of the code.

LLMs help smooth out the learning curve for using specification languages, and make it much easier to generate and check that implementations match the spec. There are still a lot of rough edges to work out but, to me, this absolutely seems to be the most promising direction for AI-supported system design and development in the future.

virgilp•41m ago

Nothing of what you write here matches my experience with AI.

Specification is worth writing (and spending a lot more time on than implementation) because it's the part that you can still control, fully read, understand etc. Once it gets into the code, reviewing it will be a lot harder, and if you insist on reviewing everything it'll slow things down to your speed.

> If the cost of writing code is approaching zero, there's no point investing resources to perfect a system in one shot.

THe AI won't get the perfect system in one shot, far from it! And especially not from sloppy initial requirements that leave a lot of edge (or not-so-edge) cases unadressed. But if you have a good requirement to start with, you have a chance to correct the AI, keep it on track; you have something to go back to and ask other AI, "is this implementation conforming to the spec or did it miss things?"

> five different versions of the thing you're building and simply pick the best one.

Problem is, what if the best one is still not good enough? Then what? You do 50? They might all be bad. You need a way to iterate to convergence

giancarlostoro•41m ago

Thats why I have AI do a write up about the system I want to build, I then review it all. If it looks good I use it as my prompt.

DaylitMagic•1h ago

Some random (hopefully additive and helpful) thoughts:

Many companies have older code bases / databases that can be somewhat well defined (and somewhat not). If things have been slowly iterating over 35 years, there's a lot of undocumented edge behavior that may occur; it may be beneficial to have a step before Edge Case Catalog where there's some kind of prompting to catalogue how the inputs and outputs work, and then find the different inputs and outputs - and then confirm that with Input A and Output A that it works as expected. (Legacy systems often have weird orchestration that nobody remembers.)

(Sub-note: This is somewhat part of the provable properties catalog; while this step could be placed there, it would require a re-run of edge case catalog build potentially, which isn't a bad thing.)

A small note that I personally think is a good idea is better code commenting than has been outlined here - the spec itself should be woven into the code with potentially slightly over-commenting for each aspect, code spec gets lost. The code itself should serve as context, especially in the TDD stage.

I think it's implicit but may be worth overtly stating that for the Code Quality check in Phase 3 that it also checks on a zero-trust basis, and doesn't include things like hardcoded keys.

I'm not sure what Chainlink is (sorry!) but I like the ideas outlined around the decomposition - but it misses stringing everything together end-to-end in the way outlined here (it asks to create each part, but never actually weaves the whole together).

Something not covered - is sequencing work and decomposition of work. A spec can create multiple dependencies within itself, requiring things to be worked on in a specific order.

galoisscobi•1h ago

I think this word salad doesn’t have enough buzzwords. Throw in a few more acronyms too.

desireco42•1h ago

Claude or something different... there is life beyond Claude I assure you and it is quite good and colourful.

mitchbob•53m ago

Upvoted for the Sarcasmotron.

esafak•39m ago

I'm pretty sure this is AI slop not worth my time. What would be interesting is if the author shared her practical experience in implementing it. Let's see some of those specs. What tricky bugs did it catch? The author's latest repo hasn't even been passing CI, so what does that say? https://github.com/dollspace-gay/Tesseract-Vault/commits/mai...

relativeadv•34m ago

yes, all of the typical signs and symptoms appear to be there. Lists upon sublists of verbose overengineered plausibly thoughtful writing.

If you can't be bothered to write it, why should i be bothered to read it?

vielite1310•38m ago

I would like to be enlightened myself if RPI,BMAD or any spec-driven approaches actually worked for any mid/large scale projects, without wasting millions of tokens of course :)

SirensOfTitan•31m ago

LLM-assisted development feels a lot like trend-driven development. When dealing with technique and heterogenous prompts and goals, it’s easy to gain somewhat of a gambler’s fallacy with respect to a particular technique.

Spec-driven development feels pretty questionable to me. I’m sure it works fine for feature work that is predictable or has been done before, but then I wonder why you’d waste your time with it.

Prior to LLMs, the whole vibe was to iterate rapidly toward a working thing so you can see what works and what doesn’t. Why would we abandon that strategy as an industry when the cost of writing code is ostensibly getting cheaper?

If I’m using LLMs at all, I’m using them to do a breadth search of prior art or ideas, then I’m doing what I might call a prototype onion: successive clean room attempts at a novel problem, accumulating what I learn at each attempt in each successive prompt. I usually then take the prototype and write the final version myself so I’m properly internalizing the idea.

Ultimately a lot of this prompt work feels like procrastination. It is not about understanding where these tools is useful and where they are not but trying to have them consume every aspect of the work.

getnormality•18m ago

Or maybe people who like talking much more than they like code are now very excited about the possibility that talking has eaten software development.

This is exactly backwards. For many tasks, formal languages are better, more real, more beautiful than English. No matter how many millions of tokens you have, you will never talk the formulas of Fermat, Euler, and Gauss into irrelevance. And the same is true of good code.

Of course, a lot of code is ugly and utilitarian too, and maybe talking will make it less painful to write that stuff.

pron•28m ago

If you come up with a strategy that seems to "solve programming", then you know for certain there must be a flaw in it, and you need to identify where it is that corners must be cut and how.

Computer science is an introspective discipline because it studies the essential difficulty of problems regardless of the process taken to solve them, and programming itself (i.e. the problem of producing a correct, or correct-enough program) is such a problem that can be, and has been studied. The question of learning whether a program X satisfies some correctness property P is known as the model-checking problem, and we know that answering it with certainty is intractable. For example, some properties that are true for some program would take no less than 10 minutes to verify (regardless of how that verification is done), others will take no less than 10 hours, others no less than 10 months, others no less than 10 years and so on, and we don’t know ahead of time whether the proprty is true, and if it is, where on this spectrum it falls.

So suppose you decide some property must be proven with full certainty, the question becomes, how long do you wait before giving up waiting for the validation and what do you do when you give up? If you then decide that you’re okay with less than 100% confidence, what approach do you take and how much confidence do you actually have? The problem with that is that the answer to that question often requires a deep understanding of the implementation. I.e. if you have two programs, X and Y, that compute the same function, one less-than-perfect approach would give you 99% confidence with one of them, but only 10% confidence with another.

twoodfin•22m ago

More, “If your LLM comes up with a strategy to ‘solve programming’ …”

prisenco•20m ago

I will never discourage new approaches but I will personally wait five to ten years to see what works.

I've been through too many hype cycles.

Robdel12•15m ago

I’ve gotten the absolute best results from LLMs just acting like the software engineer I’ve aspired to be the past 15 years.

Normal dev things. Scope the ticket properly, break it down. Test well. Write the correct docs.

LLM specific things are going to be gone next week

beders•12m ago

> Define the contract before writing a single line of implementation. Specs are the source of truth.

There is only one source of truth and that is the source code. To define and change contracts written in an ambiguous language and then hope the right code will magically appear, is completely delusional.

Iteration is the only game in town that is fast and produces results.

Cognitive Debt: When Velocity Exceeds Comprehension

Obsidian Sync now has a headless client

Verified Spec-Driven Development (VSDD)

Addressing Antigravity Bans and Reinstating Access

Woxi: Wolfram Mathematica Reimplementation in Rust

The happiest I've ever been

New evidence that Cantor plagiarized Dedekind?

Show HN: Now I Get It – Translate scientific papers into interactive webpages

Ghosts'n Goblins – “Worse danger is ahead”

Werner Herzog Between Fact and Fiction

The whole thing was a scam

747s and Coding Agents

How Long Is the Coast of Britain? (1967)

Techno‑Feudal Elite Are Attempting to Build a Twenty‑First‑Century Fascist State

We Will Not Be Divided

Unsloth Dynamic 2.0 GGUFs

From Noise to Image – interactive guide to diffusion

The Eternal Promise: A History of Attempts to Eliminate Programmers

OpenAI fires an employee for prediction market insider trading

The Future of AI

The Life Cycle of Money

Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code

The United States and Israel have launched a major attack on Iran

Show HN: Tomoshibi – A writing app where your words fade by firelight

OpenAI agrees with Dept. of War to deploy models in their classified network

Don't use passkeys for encrypting user data

Don't trust AI agents

CSP for Pentesters: Understanding the Fundamentals

More Cows, More Wives

Seeing Like a Sedan

Cognitive Debt: When Velocity Exceeds Comprehension

Obsidian Sync now has a headless client

Verified Spec-Driven Development (VSDD)

Addressing Antigravity Bans and Reinstating Access

Woxi: Wolfram Mathematica Reimplementation in Rust

The happiest I've ever been

New evidence that Cantor plagiarized Dedekind?

Show HN: Now I Get It – Translate scientific papers into interactive webpages

Ghosts'n Goblins – “Worse danger is ahead”

Werner Herzog Between Fact and Fiction

The whole thing was a scam

747s and Coding Agents

How Long Is the Coast of Britain? (1967)

Techno‑Feudal Elite Are Attempting to Build a Twenty‑First‑Century Fascist State

We Will Not Be Divided

Unsloth Dynamic 2.0 GGUFs

From Noise to Image – interactive guide to diffusion

The Eternal Promise: A History of Attempts to Eliminate Programmers

OpenAI fires an employee for prediction market insider trading

The Future of AI

The Life Cycle of Money

Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code

The United States and Israel have launched a major attack on Iran

Show HN: Tomoshibi – A writing app where your words fade by firelight

OpenAI agrees with Dept. of War to deploy models in their classified network

Don't use passkeys for encrypting user data

Don't trust AI agents

CSP for Pentesters: Understanding the Fundamentals

More Cows, More Wives

Seeing Like a Sedan

Verified Spec-Driven Development (VSDD)

Comments