One thing we learned the hard way: shorter rules work better. We started with a 600-line comprehensive guide and the agent actually got worse. Every token in the skill competes for context window space with your actual conversation. Once we cut to under 200 lines per skill, consistency went up significantly.
The semantic vs pragmatic function split in this post is a good frame. I am not sure agents need that level of abstraction explained to them though — what they actually need is concrete examples. "Use pdfplumber not PyPDF2" beats "prefer minimal semantic functions" every time.
I will, often go back, after the fact, and ask for refactors and documentation.
It works. Probably a lot slower than using agents, but I test every step, and it is a lot faster than I would do it, unassisted.
I use a test harness, and step through the code, look at debug logs, and abuse the code, as much as possible.
Kind of a pain, but I find unit tests are a bit of a "false hope" kind of thing: https://littlegreenviper.com/testing-harness-vs-unit/
Good content tho!
benswerd•1h ago
This is my take on how to not write slop.
tabwidth•58m ago
peacebeard•48m ago
peacebeard•51m ago
systemsweird•48m ago
And you’re completely right, humans are still the ones in control here. It’s entirely possible to use AI without lowering your standards.