The Short Leash AI Coding Method for Beating Fable

https://blog.okturtles.org/2026/07/short-leash-ai-method/

42•Riseed•3h ago

Comments

sscaryterry•3h ago

There really wasn't much substance to this article.

threethirtytwo•56m ago

It’s just parroting the current trope.

Last year it was, “AI is just a stochastic parrot.”

This year it’s, “AI can write the code, but a human still has to review it!” (Using AI, of course.)

Give it another year and the narrative will be: “Only AI is capable of reviewing code, and only AI can review the AI’s review. Humans just need to read the AI’s final opinion so they still have meaningful oversight.”

The goalposts keep moving. The certainty never does.

reinitctxoffset•11m ago

The regress ends somewhere, because (barring some pretty sharp changes to the way the law works basically everywhere) ultimately someone has to certify the outcomes as acceptable. This might be in the form of the market (though AI-adjacent stuff seems extremely prone to prolonged market failures), this might be regulatory in nature. This might be the executive management of the companies involved.

Personally I think that if you cranked the capability up high enough the first person you'd run into who absolutely demanded more than vibes and didn't care about your singularity thesis would be the representative of a reinsurance firm: mostly to do serious stuff without bending the law, you need insurance, and I am unaware of anyone writing serious policies (certainly not ones that make any economic sense) that underwrite the risk of AI autonomy outcomes financially.

When Swiss Re writes a policy that Anthropic Cinematic Universe or whatever iteration we're on won't fuck it up?

Now maybe we're talking. Until then you ask three practitioners and get nine answers, no one knows what they're talking about unless they're doing a really good job keeping it quiet (and that's probably what you'd do!).

bonsai_spool•1h ago

I'm curious whether Opus4.8 or similar can attain Mythos level through good system prompting and steering? You would expect this to work if it's true that the strength of Mythos is its unwillingness to quit before it gets a desired outcome

pllbnk•48m ago

I think that Anthropic is gaslighting us with their new model releases. Specifically, I think they have some good base model and are just fine-tuning it until they achieve desired outcome, or the desired outcome is achieved accidentally as part of fine-tuning. My theory is based on the fact that as a long-term (if you can call it that way) Claude user I keep noticing the same patterns it outputs. It's not trivial but certainly possible to see when something has been written by Claude because it has a different style than GPT.

However they have quite good harness in their backend which is the actual model.

guessmyname•42m ago

As a Mythos user (I’m part of Project Glasswing), I would say that abliterated models [1] produce similar, if not identical, results. While good prompting and steering won’t give Claude Opus 4.8 the same capabilities as Mythos (preview 1), using abliterated models (if you have the computational power to run the larger ones) will get you close to the same goals as people who have access to Mythos (preview 1) [2].

[1] https://huggingface.co/search/full-text?q=abliterated&type=m...

[2] I specifically refer to “preview 1” because the newer versions (Fable 5 / Mythos 5) don’t appear to offer the same level of freedom as the very first version that I was able to use through Project Glasswing. This is one of the reasons why I continue running our massive security scans with “preview 1”, or at least I was running them until June 30, when the program’s policy changed.

johndough

jonplackett•44m ago

I thought this was how everyone who can actually code uses AI for anything that’s actually important.

Am I wrong? Are you guys just YOLOing everything these days?

gambiting•32m ago

>>You never use “YOLO” mode (aka “dangerously skip permissions”)

Do you mean this?

I'm curious how are people using Claude in any way other than bypass-permissions. I've tried for so long to maintain a curated list of things Claude can use, but inevitably I would always come back only to find it stuck because it decided to pipe an output of one tool into another and that's not explicitly allowed so it stopped even though it was just greping or whatever. I found it infuriating. In bypass-permissions it "just works" but then again I only use it to analyze existing code and suggest new changes(and even if it breaks something that's what source control is for?)

sebmellen•27m ago

I’ve found unexpected success in using ephemeral NixOS VMs for local development… once you authenticate your agent you can let it run wild without worrying about permissions.

taormina•20m ago

It does do this to frustrate you, save 30 tokens, and then waste a few thousand more when it didn't get all the context it needed by grep'ping. You have to be involved in the process though. It frequently wants to do things that are so incorrect, that even if it would be more convenient to just totally ignore it, it would be insane to actually ignore it. Do you trust it to not accidentally rm -rf the .git/ right after it helpfully force pushes to remote? I don't. Even if I don't expect it to do that, why would I ALLOW it to be able to?

avereveard•43m ago

Seems hella inefficient.

Better method start to realizing that everything that every program do is data transformations and or movement

Then you ask llm to subdivide data in a tree along the domain model, classifing streaming vs storing nodes

Then for each node you discuss with the ai for the best data structure

Then you ask for an interface that fully encapsulate the structure and every mutation only allows to go from a valid state to a valid state and bidding else is allowed to touch the state

And that's mostly it just connect all the interfaces until input goes to monitor or to storage or to api or wherever the destination is

kristianc•36m ago

In my experience it, or something close to it, is the only way. AI needs good code to be beaten out of it.

kissgyorgy•40m ago

This is probably slower than writing the code yourself. Doesn't make sense to me. Using an agent without YOLO mode is not wort it.

The way I rather do it is tightly control the output by skills written yourself, prompts, plans, etc. and have the closest possible outcome you would write yourself.

faizshah•26m ago

Not really if it takes you 15 minutes to write a 50 line function but it takes the AI 90 seconds then you already are at a 10x speedup just for this task.

This (non-yolo mode AI coding) is actually how we used to code in the old days (2023).

hungryhobbit•34m ago

I <3 how everyone and their brother feels qualified to write advice to hundreds? thousands? of other developers about AI ... based on a couple months of experience as a personal user.

I mean, it's like writing a book about how to use React or Django or some other major software ... after you used it for one project for a month!

Authors: I know this is the Internet, and I know bloggers blog about whatever pops into their head ... but if you are going to act like an authority, how about you learn more than the average reader before you start telling them authoritatively what to do?

kristianc•29m ago

People are doing what they've always done with any other new technology, and sharing what, personally, works for them. People can take or leave the advice.

hungryhobbit•23m ago

Right but there's a marked difference between a "I just tried this new tech and here's what I think" vs. "I've used this tech for a few months and now I'm going to speak like I know everything about it".

I have no beef with people writing about new tech, but I do have beef with claiming that "____ is the correct way to do it" ... based on nothing except "I feel proud of the last three months I spent with Claude".

tracerbulletx•28m ago

There are a lot of people with a long career in the old way of doing things are feeling incredibly threatened and defensive and desperate to virtue signal about AI.

reinitctxoffset•23m ago

moezd•26m ago

LLMs are still next token predictors, just because you can give it more vague instructions and it still finds the right steps to follow, it doesn't mean it's intelligent. It means you're speaking the same language as the harness they trained your model on.

And that has a limit. If you are stuck at PoC level or simple apps, you have no idea how limited the current models still are. There you really need to break tasks down, not just trust a token predictor to list steps that sound good. There has to be a human in the loop somewhere, because by the time you start skipping permissions, best case you get the jackpot, more likely is you get a suboptimal solution and token waste and what's genuinely still terrifying when the model ignores instructions and does some stupid nonsense, ruining your day. It really is as sharp as a CNC machine. It's not not useful, but could be dangerous, so maybe don't try to carve wood with a monster machine, or park your Ferrari in that crammed neighbourhood if you don't know how to parallel park.

semiquaver•1m ago

Yeah, and you’re just a next-word-sayer.

steezeburger•24m ago

I find it hard to stay engaged doing this. I do get good results, but it's just hard to not get distracted when it's doing the work.

sothatsit•22m ago

This “short leash” seems like more of a crutch to me, and a sign of not giving the AI enough detail on the problem to begin with, or not reviewing and iterating on its output.

I much prefer having detailed discussions about a feature or idea, letting the AI off the leash to implement it, and then coming back to have a detailed review discussion. This seems to get a lot more out of better models that can have more nuanced discussions and write better code. The process of discussing designs and their implementations, questioning things that look weird to me, and actually reading the AI’s responses also helps me to find better solutions.

For example, one time I wanted to write a greedy solver for a problem, and Opus suggested using an existing MILP library to solve the problem exactly. I’d never even heard of MILP, but my final implementation ended up being better and simpler than what I’d have done alone.

densekernel•7m ago

I tend to agree,

If you have invested significantly in the planning phase and there is momentum in the architecture and conventions that already exist in the project, the implementation phase might not need as much oversight as is suggested here.

> You can discover that your initial idea was dumb and a better one exists

The planning and architecture phase is usually where I make these types of discovery at a high level.

> Your agent might go “off the rails” and start doing something you don’t want it to do

Candidly these orthogonal, inadvertent edits aren't as bad as they once were and for impactful changes there should be at least some test coverage, even if that test coverage is just "freezing" what was implemented.

As you mentioned the final review discussion is a good chance to verify beyond what review or adversarial review agents find.

WhitneyLand•13m ago

This post seems like some decent advice mixed in with a lot of overconfidence and unverifiable claims.

“expert developers whose skills have reached the point where they outclass any and all “frontier AI models” in their area of expertise”

Are any developers saying they outclass any and all frontier models? I’d say at best it’s mixed at this point. The best developers still do certain things better, but not even close to all things.

“The problem is that even code written and/or reviewed by Fable 5, will stink”

I’m skeptical. Example prompt and output please.

afro88•42s ago

Maybe I'm too optimistic, but given appropriate skills and references (not just for writing but also reviewing) and intelligent use of subagents for isolated reviews and checks, you can lengthen the leash a bit.

But you still need to properly review plans and PRs to keep a good mental model of the codebase. This effectively limits the number of tasks being done in parallel to maybe 2-3. Though you'll be mentally exhausted and probably start to make mistakes or take shortcuts in reviews yourself.

Virginia bans sale of geolocation data

Exapunks (2018)

Since Linux 6.9, LUKS suspend stopped wiping disk-encryption keys from memory

Reality has a surprising amount of detail

Podman v6.0.0

Lightning Memory-Mapped Database Manager (LMDB) 1.0

EFF letter to FTC on X consent order (2 July 2026) [pdf]

PeerTube is a free, decentralized and federated video platform

This is my attempt to get Vulkan going on NetBSD

Postgres transactions are a distributed systems superpower

How to ask for help from people who don't know you

The Short Leash AI Coding Method for Beating Fable

Great Salt Lake Tracker – Grow the Flow

Superpowers 6

JEP 539: Strict Field Initialization in the JVM moved to preview

Immich 3.0

Claude-real-video － any LLM can watch a video

Launch HN: Manufact (YC S25) – MCP Cloud

A New Catalog of Stellar Rotation Periods for over a Million Stars

BlastRadar – paste a Git diff, get a production risk score in 10 seconds

Show HN: zkGolf – Competitive optimization of formally verified circuits

Spain Orders Blacklist of Palantir from Public and Private Companies

24-bit/192kHz music downloads and why they make no sense (2012)

Client-side load balancing at a million requests per second

Hazel (YC W24) Is Hiring for Our Largest Government Contract

LibreCAD in the Browser

Wireless LAN SD

German button maker searched rivers of American Midwest for valuable shells

AI can't be listed as inventor on patent applications, Japan's top court rules

Simple, beautiful Emacs modeline: modusregel

The Short Leash AI Coding Method for Beating Fable

Comments

Virginia bans sale of geolocation data

Exapunks (2018)

Since Linux 6.9, LUKS suspend stopped wiping disk-encryption keys from memory

Reality has a surprising amount of detail

Podman v6.0.0

Lightning Memory-Mapped Database Manager (LMDB) 1.0

EFF letter to FTC on X consent order (2 July 2026) [pdf]

PeerTube is a free, decentralized and federated video platform

This is my attempt to get Vulkan going on NetBSD

Postgres transactions are a distributed systems superpower

How to ask for help from people who don't know you

The Short Leash AI Coding Method for Beating Fable

Great Salt Lake Tracker – Grow the Flow

Superpowers 6

JEP 539: Strict Field Initialization in the JVM moved to preview

Immich 3.0

Claude-real-video － any LLM can watch a video

Launch HN: Manufact (YC S25) – MCP Cloud

A New Catalog of Stellar Rotation Periods for over a Million Stars

BlastRadar – paste a Git diff, get a production risk score in 10 seconds

Show HN: zkGolf – Competitive optimization of formally verified circuits

Spain Orders Blacklist of Palantir from Public and Private Companies

24-bit/192kHz music downloads and why they make no sense (2012)

Client-side load balancing at a million requests per second

Hazel (YC W24) Is Hiring for Our Largest Government Contract

LibreCAD in the Browser

Wireless LAN SD

German button maker searched rivers of American Midwest for valuable shells

AI can't be listed as inventor on patent applications, Japan's top court rules

Simple, beautiful Emacs modeline: modusregel