Get-Shit-Done

https://github.com/glittercowboy/get-shit-done

14•mpartel•1w ago

Comments

rco8786•1w ago

This looks interesting, anyone tried it yet?

steve_adams_86•1w ago

I gave this a shot yesterday and thought they did a great job with the tool, generally. I love the focus on defining what needs to be done and the criteria for completion. These are great practices with or without AI in the picture.

I'm not quite jiving with it yet. I still like working with Claude more like a peer than as a subordinate. I have a fairly small list of permissions in my config, and I intervene in processes quite often. I still define targets and criteria, but I don't yet want to just let it loose and return to a semi-arbitrary change set to review. I think I actually prevent a ton of review by being involved in the loop early and often.

Maybe where this would shine is when I find small, nagging, off-topic issues while working on a branch, and I can pause to send GSD on a mission on another work tree to get started with resolving it. Then I can wash my hands of this irritating thing and continue focusing on my branch, but finish up and have some foundation to continue work on the issue I found. I have a bad habit of getting detailed. This would let me stay focused while still feeling like the right work is being done.

I'm not sure yet. I dig the loop model and I could see clearly that it works remarkably well. I might just take some time to warm up to the idea of giving this much autonomy. Maybe on very small goals (refactor this function in this manner to get this result, verified using these testing patterns), but not architectural or critical paths problems. I still feel like every model I've tried is simply bad at this, no matter how I tackle it.

Maybe my prompt game is weak.

Edit to add what I used it for:

1. Refactored an Effect pipeline to aggregate schema and row-based violations as program failures in the error channel rather than manually pushing them to arrays. This actually improved performance substantially and made the program quite a bit clearer to reason about, and GSD did a great job of following direction and verifying the work was completed properly. I thought this was pretty cool. Not a terribly hard problem, more like a minor adaptation due to an earlier oversight (Claude struggles to follow the channel conventions in Effect), but very pleasing to be able to do this automatically and get properly useful tests out of the deal.

I use Effect.catchTags at the end of the program to exhaustively catch and handle known errors and it works wonderfully. It allows me to easily partition all kinds of violations without any clever data structures or complex logic in the business logic, so to speak.

2. Added some gnarly DuckDB error parsing to help determine causes of failures in the DB in edge cases. It seemed to do a fine job. It added sane, not overly-rigid error parsing strategies, along with sensible tests to validate each possible case. Nothing Claude couldn't do normally, but I do think it did a slightly better case than it would with a one-shot attempt. It refactored tests several times to actually verify the behaviours, which Claude tends to be awful at.

3. Once I noticed it's good at making sure tests actually test things, I had it run through a test suite and ensure each test was verifying behaviours rather than implementations, or other fluffy conventions. It did fine. There were a couple bad tests in there from some work I'd done with Claude in the morning.

At this point I embarked on a journey I didn't feel comfortable using GSD for.

satisfice•1w ago

You vibed this? Did you test it?

I mean, did you test it properly or just shrug? I’m so fucking sick of smug vibers who toss slop like they think we are happy pigs.

Red Queen's Race

The Anthropic Hive Mind

A Horrible Conclusion

I spent $10k to automate my research at OpenAI with Codex

From Zero to Hero: A Spring Boot Deep Dive

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

Cook New Emojis

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard