I'm not quite jiving with it yet. I still like working with Claude more like a peer than as a subordinate. I have a fairly small list of permissions in my config, and I intervene in processes quite often. I still define targets and criteria, but I don't yet want to just let it loose and return to a semi-arbitrary change set to review. I think I actually prevent a ton of review by being involved in the loop early and often.
Maybe where this would shine is when I find small, nagging, off-topic issues while working on a branch, and I can pause to send GSD on a mission on another work tree to get started with resolving it. Then I can wash my hands of this irritating thing and continue focusing on my branch, but finish up and have some foundation to continue work on the issue I found. I have a bad habit of getting detailed. This would let me stay focused while still feeling like the right work is being done.
I'm not sure yet. I dig the loop model and I could see clearly that it works remarkably well. I might just take some time to warm up to the idea of giving this much autonomy. Maybe on very small goals (refactor this function in this manner to get this result, verified using these testing patterns), but not architectural or critical paths problems. I still feel like every model I've tried is simply bad at this, no matter how I tackle it.
Maybe my prompt game is weak.
Edit to add what I used it for:
1. Refactored an Effect pipeline to aggregate schema and row-based violations as program failures in the error channel rather than manually pushing them to arrays. This actually improved performance substantially and made the program quite a bit clearer to reason about, and GSD did a great job of following direction and verifying the work was completed properly. I thought this was pretty cool. Not a terribly hard problem, more like a minor adaptation due to an earlier oversight (Claude struggles to follow the channel conventions in Effect), but very pleasing to be able to do this automatically and get properly useful tests out of the deal.
I use Effect.catchTags at the end of the program to exhaustively catch and handle known errors and it works wonderfully. It allows me to easily partition all kinds of violations without any clever data structures or complex logic in the business logic, so to speak.
2. Added some gnarly DuckDB error parsing to help determine causes of failures in the DB in edge cases. It seemed to do a fine job. It added sane, not overly-rigid error parsing strategies, along with sensible tests to validate each possible case. Nothing Claude couldn't do normally, but I do think it did a slightly better case than it would with a one-shot attempt. It refactored tests several times to actually verify the behaviours, which Claude tends to be awful at.
3. Once I noticed it's good at making sure tests actually test things, I had it run through a test suite and ensure each test was verifying behaviours rather than implementations, or other fluffy conventions. It did fine. There were a couple bad tests in there from some work I'd done with Claude in the morning.
At this point I embarked on a journey I didn't feel comfortable using GSD for.
rco8786•1h ago