Show HN: Goal.md, a goal-specification file for autonomous coding agents

2•jmilinovich•2h ago

Comments

jmilinovich•2h ago

I had 30 broken Playwright tests and no way to tell which ones actually mattered. The problem wasn’t “fix the tests” — it was that there’s no coverage tool for test infrastructure trustworthiness. I had to build the ruler before I could measure anything.

So I wrote a file that defined a composite metric (four weighted components → one score), an improvement loop, and constraints. Pointed Claude at it. Went to bed. Woke up to 12 commits, 47 → 83.

The file became GOAL.md. The insight that surprised me: most software doesn’t have a natural scalar metric like val_bpb. You have to construct it. Documentation quality, API trustworthiness, test infrastructure confidence — these things have no pytest –cov equivalent. But once you build the ruler, the autoresearch loop works on them too.

The part I’m most uncertain about: the “dual score” pattern. When the agent is building its own measuring tools, it can game the metric by weakening the instrument. So the docs-quality example has two scores — one for the docs, one for the linter itself. The agent has to improve the telescope before it can use it. I think this is load-bearing but I’d love to hear if others have found different solutions to the same problem.

Easiest way to try it: paste this into Claude Code, Cursor, or any coding agent and point it at one of your repos:

Read github.com/jmilinovich/goal-md — read the template and examples. Then write me a GOAL.md for this repo and start working on it.

Happy to hear what breaks. The scoring script is bash + jq so it’s not exactly production-grade, and the examples are biased toward the kinds of projects I work on. More examples from different domains would make the pattern sharper.

Show HN: GDSL – 800 line kernel: Lisp subset in 500, C subset in 1300

Show HN: Signet – Autonomous wildfire tracking from satellite and weather data

Show HN: What if your synthesizer was powered by APL (or a dumb K clone)?

Show HN: Claude's 2x usage promotion (March 2026) in your timezone

Show HN: HN Skins – Available Skins: Cafe, Courier, London, Midnight, Terminal

Show HN: Goal.md, a goal-specification file for autonomous coding agents

Show HN: Detach – Mobile UI for managing AI coding agents from your phone

Show HN: Han – A Korean programming language written in Rust

Show HN: Ichinichi – One note per day, E2E encrypted, local-first

Show HN: Sway, a board game benchmark for quantum computing

Show HN: AgentMailr – dedicated email inboxes for AI agents

Show HN: GitAgent – An open standard that turns any Git repo into an AI agent

Show HN: GrobPaint: Somewhere Between MS Paint and Paint.net

Show HN: RSS tool to remix feeds, build from webpages, and skip podcast reruns

Show HN: Dialtone watcher – what is my laptop doing and am I normal

Show HN: Channel Surfer – Watch YouTube like it’s cable TV

Show HN: Context Gateway – Compress agent context before it hits the LLM

Show HN: Code Royale – Play and learn poker with Claude Code (skill)

Show HN: Data-anim – Animate HTML with just data attributes

Show HN: Ink – Deploy full-stack apps from AI agents via MCP or Skills

Show HN: Axe – A 12MB binary that replaces your AI framework

Show HN: Voice-tracked teleprompter using on-device ASR in the browser

Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)

Show HN: Learn Arabic with spaced repetition and comprehensible input

Show HN: Lengpal – simple video chat for language exchange

Show HN: BurnShot v2- Zero-Knowledge image sharing (Challenge to decrypt this)

Show HN: OneCLI – Vault for AI Agents in Rust

Show HN: I built Wool, a lightweight distributed Python runtime

Show HN: Rudel – Claude Code Session Analytics

Show HN: s@: decentralized social networking over static sites

Show HN: Goal.md, a goal-specification file for autonomous coding agents

Comments

Show HN: GDSL – 800 line kernel: Lisp subset in 500, C subset in 1300

Show HN: Signet – Autonomous wildfire tracking from satellite and weather data

Show HN: What if your synthesizer was powered by APL (or a dumb K clone)?

Show HN: Claude's 2x usage promotion (March 2026) in your timezone

Show HN: HN Skins – Available Skins: Cafe, Courier, London, Midnight, Terminal

Show HN: Goal.md, a goal-specification file for autonomous coding agents

Show HN: Detach – Mobile UI for managing AI coding agents from your phone

Show HN: Han – A Korean programming language written in Rust

Show HN: Ichinichi – One note per day, E2E encrypted, local-first

Show HN: Sway, a board game benchmark for quantum computing

Show HN: AgentMailr – dedicated email inboxes for AI agents

Show HN: GitAgent – An open standard that turns any Git repo into an AI agent

Show HN: GrobPaint: Somewhere Between MS Paint and Paint.net

Show HN: RSS tool to remix feeds, build from webpages, and skip podcast reruns

Show HN: Dialtone watcher – what is my laptop doing and am I normal

Show HN: Channel Surfer – Watch YouTube like it’s cable TV

Show HN: Context Gateway – Compress agent context before it hits the LLM

Show HN: Code Royale – Play and learn poker with Claude Code (skill)

Show HN: Data-anim – Animate HTML with just data attributes

Show HN: Ink – Deploy full-stack apps from AI agents via MCP or Skills

Show HN: Axe – A 12MB binary that replaces your AI framework

Show HN: Voice-tracked teleprompter using on-device ASR in the browser

Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)

Show HN: Learn Arabic with spaced repetition and comprehensible input

Show HN: Lengpal – simple video chat for language exchange

Show HN: BurnShot v2- Zero-Knowledge image sharing (Challenge to decrypt this)

Show HN: OneCLI – Vault for AI Agents in Rust

Show HN: I built Wool, a lightweight distributed Python runtime

Show HN: Rudel – Claude Code Session Analytics

Show HN: s@: decentralized social networking over static sites