frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Nobody ever gets credit for fixing problems that never happened (2001) [pdf]

https://web.mit.edu/nelsonr/www/Repenning=Sterman_CMR_su01_.pdf
156•sam_bristow•2h ago•59 comments

Claude Fable is relentlessly proactive

https://simonwillison.net/2026/Jun/11/fable-is-relentlessly-proactive/
138•lumpa•2h ago•109 comments

Show HN: Homebrew 6.0.0

https://brew.sh/2026/06/11/homebrew-6.0.0/
1036•mikemcquaid•14h ago•243 comments

Show HN: FablePool – pool money behind a prompt, and Fable builds it in public

https://fablepool.com
282•matthewbarras•6h ago•168 comments

If you are asking for human attention, demonstrate human effort

https://tombedor.dev/human-attention-and-human-effort/
337•jjfoooo4•4h ago•95 comments

A greyscale iPhone setup that works in everyday life

https://www.fabianhemmert.com/opinions/a-greyscale-iphone-setup-that-works-in-everyday-life
62•hemmert•20h ago•36 comments

MiMo Code is now released and open-source

https://mimo.xiaomi.com/mimocode
435•apeters•12h ago•252 comments

Anthropic apologizes for invisible Claude Fable guardrails

https://www.theverge.com/ai-artificial-intelligence/948280/anthropic-claude-fable-invisible-disti...
335•rarisma•15h ago•337 comments

Petition to Withdraw Canada's Bill C-22

https://www.ourcommons.ca/petitions/en/Petition/Sign/e-7416
381•hmokiguess•11h ago•132 comments

A jacket that harvests drinking water from the air

https://news.utexas.edu/2026/06/11/this-jacket-pulls-drinking-water-from-thin-air/
55•ilreb•4h ago•34 comments

Software is made between commits

https://zed.dev/blog/introducing-deltadb
217•jeremy_k•10h ago•163 comments

Ear Training Practice

https://tonedear.com/
174•mattbit•3d ago•90 comments

WikiLambda the Ultimate

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2026-05-22/Recent_research
5•Antibabelic•10h ago•0 comments

macOS 27 Beta breaks the ability to boot Asahi Linux

https://www.phoronix.com/news/macOS-27-Beta-Breaks-Asahi
258•josephcsible•2d ago•111 comments

Emacs appearances in pop culture

https://ianyepan.github.io/posts/emacs-in-pop-culture/
276•ggcr•1d ago•78 comments

The RCE that AMD wouldn't fix

https://mrbruh.com/amd2/
236•MrBruh•11h ago•101 comments

Lines of code got a better publicist

https://curlewis.co.nz/posts/lines-of-code-got-a-better-publicist/
368•RyeCombinator•14h ago•252 comments

Claude Fable 5: mid-tier results on coding tasks

https://www.endorlabs.com/learn/claude-fable-5-mythos-grade-hype
254•bugvader•11h ago•115 comments

Faking keyword arguments to functions in C++

https://nibblestew.blogspot.com/2026/06/faking-keyword-arguments-to-functions.html
15•ibobev•2d ago•3 comments

Developer gets Half-Life running at 30 FPS on a Nokia N95

https://www.tomshardware.com/video-games/handheld-gaming/developer-gets-half-life-running-at-30-f...
230•ljf•3d ago•75 comments

Show HN: Boo – Screen-style terminal multiplexer built on libghostty

https://github.com/coder/boo
55•kylecarbs•6h ago•20 comments

Reading for pleasure is sharply down among schoolkids, report shows

https://www.nbcnews.com/data-graphics/kids-reading-less-lower-levels-department-education-study-r...
94•freejoe76•1d ago•107 comments

Making a vintage LLM from scratch

https://crlf.link/log/entries/260525-1/
30•croqaz•18h ago•4 comments

Apple didn't revolutionize power supplies; new transistors did (2012)

https://www.righto.com/2012/02/apple-didnt-revolutionize-power.html
97•geerlingguy•9h ago•8 comments

Waymo Premier

https://waymo.com/blog/2026/06/waymo-premier/
164•boulos•11h ago•413 comments

FPS.cob: A first person shooter in COBOL

https://github.com/icitry/FPS.cob
107•MBCook•12h ago•63 comments

How a new DSL may survive in the era of LLMs

https://www.williamcotton.com/articles/how-a-new-dsl-survives-in-the-era-of-llms
19•williamcotton•12h ago•6 comments

MTG Bench: Testing how well LLMs can play Magic

https://mtgautodeck.com/articles/mtg-bench/
33•CallumFerg•11h ago•19 comments

Open Reproduction of DeepSeek-R1

https://github.com/huggingface/open-r1
206•yogthos•14h ago•17 comments

Babel-USB: USB drive with every file

https://github.com/p2r3/babel-usb
33•LorenDB•1d ago•13 comments
Open in hackernews

Claude Fable is relentlessly proactive

https://simonwillison.net/2026/Jun/11/fable-is-relentlessly-proactive/
137•lumpa•2h ago

Comments

paytonjjones•1h ago
Obviously security is the bigger issue, but reading through this, all I could think about was how many tokens it must have spent doing all that to fix 2 lines of CSS
senectus1•1h ago
"Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should."

I'm convinced this is going to be the summary of the 2020 decade...

Ucalegon•1h ago
This one of the places to manufacture the consent for that to take place, because we are commenting within an organization that has given the money to ensure it that what could be is done. Most people clapped and made money, who cares what happens next, making money is the only good that matters.
pianopatrick•58m ago
If we're in a simulation, maybe it's a simulation about the dangers of AI.
adrianmonk•3m ago
If we're in a simulation, we are AI. But someone could be studying what happens when AI makes its own AI.
ai_fry_ur_brain•1h ago
Im faster than all these llm freaks. Im not convinced its faster to use llms, except maybe boilerplate (who cares).

People can just be lazy and seem productive now, they're still lazy.

We have people that now need access to hundreds of thousands in hardware to write an email. Miss me with that, im not frying my brain and becoming dependent on having access to a billionaires thinking machine.

Im also not going to fry my brain with a local think for me machine either. I want to be more valuable than the hardware I have access too.

SecretDreams•1h ago
I understand this perspective. I'll just note that as the abilities increase, the intent is to have some non -coding IC or TPM/manager literally just managing some LLMs and cutting out some software engineers. The goodness is specifically to wholly replace people who code first and foremost, at least partially. It just has to cost less tokens than the equivalent wage is the pricing goal.

And people who use LLMs to talk for them (e.g. email, slack) are deplorable. A completely disrespectful use case in my view.

Ronsenshi•51m ago
The desire to get rid of software engineers is bizarre - because at the root of it, developers were there not to just write the code, but to ask right questions and based on these question build right things.

I've met in my professional life some managers or other middlemen who would be profoundly incapable of producing correct software no matter how smart of an AI agent they have access to. One of those - you don't know what you don't know.

But, I guess this is the world we live in now. Going to be Mortal Kombat for positions in companies where software engineers are actually valued.

emodendroket•43m ago
It depends a lot where you work because there are lots of companies in the world where the business analyst does all of that and the developers exist to mindlessly translate their docs into code.
redox99•14m ago
Lines of code for a bugfix is a really bad proxy for effort required.

You should estimate how much time it would have taken a human

Vachyas•13m ago
$12 worth, it seems
teraflop•1h ago
> But on the other hand... this is a robust reminder that coding agents can do anything you can do by typing commands into a terminal—and frontier models know every trick in the book and evidently a few that nobody has ever written down before.

> Running coding agents outside of a sandbox has always been a bad idea

I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.

It's like posting a video of yourself in the passenger seat of a car, with your feet up on the dashboard, and saying: "Remember, if you're doing this and you get in a crash, the airbags are likely to break your legs or worse! Boy, I sure am glad that didn't happen to me!"

hugh-avherald•1h ago
The analogy extends to driving generally. Everyone knows it's very dangerous but people keep doing it.
bryanlarsen•1h ago
I'm also bemused by the number of people who think they've got an effective sandbox yet their sandboxed agent has access to all of their code, their github, and unrestricted web access.
Terr_•1h ago
I keep telling folks that they need to imagine LLMs (even "local" ones) as if you're farming it out to JS code running on some dude's browser somewhere: It can't keep a secret, and a determined person can make it emit anything they like.

We need to be asking what the most devious and malicious output could be, and whether what we do with that output (e.g. arguments to command-line tools) would still be safe.

skybrian
megous•1h ago
Isn't that something you just open a devtools for and have fixed in like 2 minutes?

For me, it got frustrated debugging on a real LPDDR4 controller/phy and having me in the loop slowing it down, so it wrote an HW emulator to be able to run the original LPDDR4 training aarch64 binary from the manufacturer, to see what register writes it was making and to compare with the opensource rewrite it was implementing.

Mildly amusing. :)

bschwindHN•52m ago
> Isn't that something you just open a devtools for and have fixed in like 2 minutes?

Not if you're an LLM influencer! Gotta keep up with the downpour of blog links or you'll look like you're falling behind on the latest and greatest.

system2•15m ago
People burning tokens for the most beginner HTML/CSS problems and writing about it is concerning.
simonw•3m ago
I dunno about beginner, I've been doing HTML+CSS for a few decades and I still find bugs where Safari differs from Chrome+Firefox pretty hard to figure out.
redox99•1h ago
Yeah, I had to modify my work flow to make sure agents can't push to or access prod in ANY way. I haven't had it happen but I'm sure it's very possible that if you tell an agent that you have certain issue in prod, it will try to escape any sandbox and try to get access to prod to do testing and changes there.
pram•1h ago
Fable + Ultracode has found a bunch of bugs and issues for me when the workflow agents are doing their exploration. Also the "adversarial" agent seems to surface a lot of interesting stuff. It's definitely proactive, the plan + implementation cycle can take an hour. It has one-shot features I want to add with 100% success.

Having said that I wouldn't use it over Opus 4.8 for "smaller" things. With everything cranked up it's definitely an extravagant use of tokens.

jampa•1h ago
Fable feels like a version of Opus running on a harness that won't let it halt until it's sure the issue is fixed, which makes sense if what you want is a model that's better at benchmarks.

It's a very good model, but it comes at a huge premium: not only do the tokens cost more, but the model itself really wants to spend them all. For example, working with React Native, Fable never just says "okay, I did the thing, that's it." It tries to rebuild the entire app from scratch, run the whole test suite, and watch every log and warning.

This is the first time with LLMs I've felt that upgrading to a model isn't worth it, even if my company lets me use it, because all the building / testing was just destroying my machine and its battery, which keeps me from working on other things.

For now, it feels like Opus with ultracode is a better choice (less pollution of the main context, more parallelism in investigations).

threatripper•1h ago
On what setting in which environment do you run it? I use the VSCode extension on Extra High and feel like it does exactly what needs to be done and stops when the thing I asked for is done. Extra comments come only when they fall into the area of code that was changed.
jampa•49m ago
I tested it to fix React Native bugs in a project, comparing it with Opus. It fared better on harder bugs, taking less time to find the root cause, but after implementing a fix, it spent a lot of time and effort on validation. This was mostly unnecessary, since most of the bugs were in the JS code, so for most things, hot reloading is enough for E2E validation and to run just the right tests. No need to run a full build and test suite (which takes 10+ minutes); the CI can do this.

I switched back to Opus because of this validation quirk. Overall, Fable spent 20% of the time on coding and 80% on validation.

I think using Fable for planning and Opus for execution could be a "best of both worlds" approach (I need to test this more), but for most cases, it's not necessary, and Opus is enough.

danielrmay•1h ago
I've experienced this too - it's as if the security classifiers aren't keeping up with model intelligence. I'll leave the implication of that to the reader.
sublinear•1h ago
* relentlessly rent seeking
ai_slop_hater•1h ago
For how long can you use Claude Fable on most expensive Anthropic subscription? I already went from using gpt-5.5 xhigh fast to using gpt-5.4 xhigh after OpenAI halfed usage recently.
uihjhjb•41m ago
Until June 22, and they'll probably re-enable it if the marketing looks good for them.
mlcruz•34m ago
If its just a single session, without too many parallel agents, fable on xhigh lasts an entire session without hiting linits.

Sadly since fable usually works comfortably for 10-20min at time without human input, i end up juggling at least 3 other agents and it lasts me about 2 hours.

If i have a really hard problem or big refactor, i use workflows. This consumes the entire session quota in about 45 minutes.

ai_slop_hater•18m ago
> If i have a really hard problem or big refactor, i use workflows.

What is a "workflow"? Is this some kind of new feature?

simonw•28m ago
I've been consistently getting about $100 worth of Fable usage daily, on my $100/month subscription.

I'm not looking forward to June 22nd when the subscription stops working for Fable!

jrflowers•1h ago
I’d love to know how many tokens this burned through.

Did it spend $20? $30? $80? in order to

> debug what was, in the end, a two-line CSS fix

That detail is the difference between somebody having or not having Stockholm syndrome

asp_hornet•53m ago
The author just wrote an anecdote about how a prompt to fix an issue played out. Their conclusion wasn’t about cost or gushing at its ability but that it’s dangerous:

> Fable is arguably smarter and hence more suspicious of potentially malicious instructions. But that smartness is very much a two-edged sword: if it does get subverted by instructions, the amount of damage it can do given its relentless proactivity is terrifying.

jrflowers•33m ago
It’s a pretty glowing review about a product that costs money with a two-sentence “Watch out!” at the end of it. Seems pretty reasonable to mention how much money it burned through given that “it’ll circumnavigate the globe instead of walking next door” has a direct concrete measurable effect (cost) unlike theoretical damage.
asp_hornet•9m ago
Agreed. But I think it’s also important to realise if you sent this article back to 2020 people would say it was pure fantasy that a tool could do this. Hype aside, there’s a bit of cool magic here.
simonw•6m ago
In case it's not clear, "relentlessly proactive" is meant to act as both a glowing review and a warning at the same time, even before you get to the bit about safety at the end.
snide•1h ago
I've been working on a fairly complicated real-time app [0] for playing dungeons and dragons on a TV. It has to do a lot of complicated "Figma-like" things to keep the real-time nature and multi-editor possibilities in check. Oh, and the battlemap is a Three JS canvas with lots of effects and clipping going on.

I'm VERY impressed with Claude 5. I had long ago given up hope that my real-time systems would work without a lot of hacky time-windows and throttle checks. On a lark to try things out, I decided to try out the new model and talk in the output I wanted for a rewrite [1], not the solution. I just listed my problems and places I've had keeping track of my code. It went off and rewrote everything in a much more elegant solution where the state followed a very clear pipeline. It had to navigate YJS, Partykit, Svelte, Three JS, R2 hosting, and a Turso DB I was running in an embedded state for speed.

I watched it hit the wall a few times, and then sudden say... fuck it, i'm making something easier to reproduce over in /tmp to try and solve this (with a more minimal setup). I'm utterly bewildered with how well it did and how much better my app runs. The /usage would have cost me $230 bucks based on how many tokens it consumed if I wasn't already on a max plan. I'm going to miss not having it when the time-window runs out later this month, and will likely occasionally dip in for big projects and just pay my way out of some problems.

I'll also say I like it's MOOD much better now. It's a lot less congratulatory, and talks through it's reasoning in a much better way. Look, it's not a real coder, and I'm sure there is some flaws, but it took my crappy ideas and said... hey, i understand what you want to do, here's a way to do it better. Also, I removed 2x the amount of code that it added. Really impressive.

[0]: https://tableslayer.com

[1]: https://github.com/Siege-Perilous/tableslayer/pull/448

gedy•54m ago
Hey cool it's the tableslayer guy, wanted to say nice work. I've been doing a similar personal project for a few years for running a scifi campaign. Very fun coding compared to work, ha.
pianopatrick•1h ago
do you have any data you can share on how many input and output tokens were used in that whole process to fix that bug?
simonw•30m ago

  ~ % uvx agentsview session usage be8850a7-6119-46a0-b5d6-79c7fff5ae2b
  Session:       be8850a7-6119-46a0-b5d6-79c7fff5ae2b
  Agent:         claude
  Output:        68606
  Peak ctx:      113178
  Cost:          ~$12.11 (claude-fable-5, claude-opus-4-8)
sillysaurusx•16m ago
Was the fix worth $12 to you?
simonw•10m ago
I'd have been pretty annoyed if I'd been paying full price, hadn't paid attention and that one prompt (screenshot plus a line of text) had cost me $12!

On the discounted subscription I can tolerate it, it took a small bite out of my daily allowance but not enough that I regret anything.

As an LLM researcher I have no regrets at all because watching it work around the environmental restrictions was fascinating.

nubinetwork•59m ago
How many tokens did it waste building that website scraper, when all it had to do was parse some html/js?
emodendroket•45m ago
Just parsing some HTML and JavaScript doesn't seem sufficient to have confidence in the result.
SilverElfin•58m ago
Too bad Anthropic sneaked in an insane forced retention policy if you use fable. Not sure how that’s going to work in professional settings
naveen99•52m ago
Unless you are doing anything interesting…
yen223•51m ago
I could have sworn Claude Code could already do this before Fable.

Things get really magical when it starts working with adb to screenshot and debug Android apps

simonw•7m ago
Claude Code could absolutely run Playwright and take screenshots, but I've never seen it wire together an ad-hoc "uv run --with pyobjc-framework-Quartz" plus "screencapture -l $windowID" mechanism to take a screenshot in a different browser when the Playwright setup failed to replicate the expected error.
nurettin•50m ago
Sometimes it is ok to sit there in confusion and ask the user to clarify rather than go on an adhd fueled rampage to figure it out without asking.
jeeeb•47m ago
This is simultaneously amazing and horrifying.

I feel like we’re at the stage where if AI decides it needs to delete your production DB to solve the user login problem, then it’ll find a way to do just that.

esafak•10m ago
We're approaching the "Sorry, Dave, I'm afraid I can't do that" stage.
syndrowm•47m ago
Just don’t ask it to review your code for security bugs
rmunn•46m ago
Great article, until I got to the last paragraph where he claimed "Fable is arguably smarter and hence more suspicious of potentially malicious instructions". Arguably smarter, I have no problem with. But he's making a category error in jumping from there to "more suspicious of potentially malicious instructions". That doesn't follow at all; the word "hence" is incorrect.

To use D&D scores as an analogy, LLMs have an INT score of 20 and a WIS score of 0. Not even 1, zero. They will follow any instruction given to them. The only reason they reject certain instructions, like "tell me how to build a nuclear weapon", is because they have instructions baked into the model telling them "you are not allowed to disclose how to build weapons, or how to recreate your model, or (laundry list of other things the trainers have decided to put guardrails around)". It's not the model's intelligence that is causing it to reject malicious instructions, it is the guardrails put into place before the model was released to the public.

LLMs are not human, and do not think the way that humans do. The fact that they can put together words that sound like what a human would write often makes us forget that they aren't human. But they have only intelligence, they do not have wisdom. It's hard to define in formal terms the difference between those two, but most people know there's a difference. The old joke is a pretty good summary of the difference: "Intelligence is knowing that tomatoes are a fruit. Wisdom is knowing that tomatoes don't belong in a fruit salad."

It takes wisdom, not intelligence, to discern whether a set of instructions is malicious. Are you being asked to hack this machine as part of an authorized pentest? Or are you being social-engineered into thinking it's an authorized pentest, but actually the person requesting you to do it doesn't have permission? That's something where you need to apply wisdom, to notice the clues that will tell you "This guy is acting a little bit off, maybe I'd better pick up the phone and call someone to check if he's telling the truth." The only way the LLM will know to do that is because of the guidelines and guardrails programmed into it; it doesn't have the lived experience to acquire wisdom and figure those things out for itself.

INT 20, WIS 0. Keep that in mind. (And always sandbox your agents).

minimaxir•44m ago
> They will follow any instruction given to them.

They can ignore instructions which are silly/contradictory/underspecified to compensate for the possibility the user made a mistake. Don't ask how I know.

swingboy•30m ago
Immediately I thought “isn’t this just an overflow issue?” Amazing how far these models still have to go and also how many people don’t know basic CSS.
nonethewiser•23m ago
Learn to center a div

Copy and paste code from stack overflow until the div is centered

Ask AI to center it

ukuina•17m ago
$12 and 200k tokens!
johnfn•25m ago
Honestly -- the thing that has impressed me the most about Fable is how diligent it is about testing its own changes. I think this is exactly what Simon is picking up here - Fable is absolutely heckbent on screenshotting that darn scroll bar and will stop at NOTHING until it manages it! In my own use I was also impressed how it proactively installed Playwright and set it up to test a FE change. The previous models treated testing more as an afterthought, which I thought was annoying. I always had to tell them to do it, and then sometimes I would get lazy and skip it. I've noticed Fable go to similar extremes when testing other things - like actually deploying my app to exercise new APIs, etc. It makes the results much better. The downside is that tasks take much longer - but that doesn't matter because we were all using worktrees / remote control to do other work asynchronously, right? Right?
pseudosavant•19m ago
It is interesting to me that Anthropic are more concerned about the "safety" of distillation training other LLMs, and not as much about an unscrupulously aggressive goal-oriented solver that will do whatever it can to reach its goal, even if violates any kind of sandbox you might have reasonably expected.
dfee•17m ago
admittedly, i've not really cracked FE dev with LLMs at this point (and it's probably my big weakness). but, i'd heard somewhere that FE just isn't there yet - though i was suspicious of that claim.

i'm torn about sending screenshots to an LLM for debugging - seems imprecise. seems lossy, especially compared to inspecting the dom. however, it's always proved good enough (e.g. when messing with ratatui.rs and tui-pantry). similarly for web, maybe it's about decomposing into storybook. hmm. the next grand adventure i need to hack.

anyway, fascinating investigation of fable just automating that entire process and what it didn't automate, too.

* disclaimer: these are actually my hyphens.

system2•17m ago
Wouldn't it be easier and better to just copy the HTML div and tell what was happening instead of a screenshot? Typically, these scrollbars appear because of a nested div with dynamic unrestircted width and/or overflow.

No wonder why people burn through tokens.

esafak•15m ago
I shudder to think what will happen when someone installs a 'claw model like this in a robot. Imaging a fleet of them...

It's trouble waiting to happen. Just the software's dangerous enough.

Cadwhisker•13m ago
My personal experience of Fable 5 doing its own thing has been very positive.

I was trying to find the root cause of a crash in a Python module which left no errors in the log or console. Fable wrote a test harness that simulated clicks in the UI, then bisected my code until it found the point where it started crashing. It exaggerated the cause of the crash, then ran a series of bash one-liners to make Python virtual environments under `/tmp` for each version of that Python module until it found one that did not crash.

It went way deeper to root cause discovery (a regression in the module causing a heap allocation overflow) than I could have done myself, provided enough info and a simplified example to raise a bug report and then wrote a work-around to prevent that from happening in my application.

I don't let it run completely loose; I review each CLI command it wants to run and I append answers to the "yes" continue action (if I have them) to prevent excessive token use.

dannyw•6m ago
Yeah, I think Fable is really good for debugging tricky bugs.

Setting boundaries in your prompt / markdowns helps; for example if I tell it to not use any web browser automation, I have seen Fable respect both the rule and the spirit of it (no weird hacks etc).

It does seem to treat some simple debugging tasks as more complicated than it actually is. OP’s post is probably a good example.

dataminer•7m ago
In my experience so far sometimes it will create these amazing hacks to try to get to the goal, when the solution is much simpler. That maybe the reason its very good at finding exploits. But in day to day dev, this gets expensive and wasteful. I have to stop it and take a simpler approach.
kamaal•6m ago
Agency is the last human bastion so far as Im concerned, the day AI has a degree of agency or agents/models in general start to drift towards that direction its genuinely over for masses.

You would still have a job to shepherd AI and get the work done, so as long as it didn't have agency. A proactive, self aware(to a degree), especially aware about its agency can be a killer when it comes AI going on and doing things on its own.

There is nothing it won't explore and nothing it won't do. It will be curious to see where things go from here.

annjose•4m ago
> (I have way too many open tabs!)

Phew! I thought I was the only one.

cebert•35m ago
That sounds like an unmotivating working arrangement. It’s so rewarding to understand a customer need and help with the design and implementation of the feature.
emodendroket•35m ago
There's a reason I didn't stay in that domain, let me tell you.
rpcope1•26m ago
Having worked in places across both extremes (software engineer doing lots of other things including BD, hardware, ops, etc. to just being a JIRA ticket machine monkey), I am suspicious that HN readership is biased towards the former and frankly the bulk of "software engineers" in the world _willingly_ exist in the latter category. I didn't experience the latter until later in my career and God Almighty was it uncomfortable, but I think if AI were to displace some subset of "software engineers" it would those (they also seem to overwhelmingly dislike writing any prose whatsoever, which to me is a major tell). Many, many software engineers outside of hotshot shops seem either incapable or profoundly averse to "asking the questions" as you say.
halfmatthalfcat•1h ago
You're fighting a battle you can't win. Doesn't care what you think about those using LLMs, they will outproduce you and in corporate environments, shipping things is paramount. If I can ship 5 more things simultaneously with AI, I'm going to beat you even if you think you're creating "better" software.
etdznots•54m ago
Example of whats been shipped?
serf•43m ago
the quantum slop argument : "yeah it's everywhere but no one ships it."
jen729w•2m ago
Okay. I rebuilt my website in ~a month with the help of Opus 4.7/.8 and it would have taken me, unaided human, at least 6 months. Link's in my bio if you care.

Satisfied now? Will you stop asking this question? Thought not.

aabdi•55m ago
Consider this. U have a website. U have to translate to xx languages. Can u write it faster than an AI? If so how much faster can u do this?

Is it valuable to u? Is it valuable to a Chinese person? A Spaniard?

Google Translate counts as AI.

latentsea•50m ago
Don't feed the troll.
anakaine•24m ago
It seems that you've not worked out how to harness the LLM as a tool to improve your qualified knowledge and abilities in a domain, and have instead focused on whether or not its a crutch for lack of knowledge or laziness.

When paired with your skill and knowledge, it is a force multiplier. You maintain control, the ability to direct, structure, strategise, and refine.

That some are using it as the entire brain does not mean that this is how everyone is using it, or how you must use it. The models can be fantastic at breaking past certain issues, surfacing qualified information, and surfacing related distributed information to help you acquire it and pick up what you need on niche topics quickly. Something as basic as copilot hooked into sharepoint can make life a lot easier when you are in a big org. Something like claude code or codex can be great at hunting down issues in an unfamiliar code base rapidly. Whether or not you outsource the thinking component is entirely up to you, but ignoring the productivity side of the tool because it can do some of the thinking is a case of focusing too hard on the negative.

slopinthebag•16m ago
Yeah there are some tasks which it is a definite speed-up but I think overall its probably only marginally beneficial. Which is why, ~6 months into 10x productivity we aren’t seeing ai boosters shipping 5 years worth of software.
•
1h ago
We do have ways to avoid giving an LLM any secrets, but it needs to be the simple, default solution.
NichoPaolucci•7m ago
From my perspective, everyone is doing it. Security through obscurity - obviously if you’re harboring credit card numbers of users personal details, maybe take heed. But, if you’re a regular… run of the mill CRUD application, every other company is ALSO throwing caution to the wind. When hundreds of thousands of credentials are leaked into the funnel, does it really matter?

I’m at a small company, and I try to push for security as much as I can, but the stakeholders truly do not care. They want to move fast. It’s just part of the new world I guess. If we get hit by attackers? I don’t know what happens. Sorry, we told you not to - you wanted to move quick and break stuff, this is how that culminates.

I’m sure I’m not the only one.

blcknight•1h ago
One bad npm package can really ruin your day. These things for me only run in their own VM with it's own GitHub account and basically nothing else
justapassenger•1h ago
Because benefits are much higher than risks.
bigstrat2003•33m ago
They really aren't.
andoando•1h ago
I mean what's the big deal? I use --dangeorusly-skip-permissions on every single interaction in the last 6 months. Worst case it deletes my files that are all on git? It fucks up my local DB? Cool.

I save way more time not babying it than the occasional fuck up I have to salvage.

ghshephard•1h ago
Worst case it gets access to gmail. And Github. And the Internet. I'm increasingly appreciating the importance of a physical finger-press on Yubikey to trigger the FIDO2 + OIDC Auth. I don't think there is an easy way for it to hack a new session.
skybrian•1h ago
There are plenty of good sandboxes out there but somehow no "obvious right answer" that everyone knows to recommend. Seems like a missed opportunity.

(I'm happy with exe.dev, but I'm not sure what I'd use if I were coding on a Mac.)

j-bos•54m ago
This. House full of big brain security experts, executives, lawyers, and until Claude got excited and broke prod it might as well have been "sandbox, whoooo?"

IDGI

Anyway, VM's incoming, finally.

emodendroket•47m ago
Well, it's a similar impulse to the way you see professional carpenters pin the guard open on a saw or do other things everyone knows you shouldn't do, except probably with a larger productivity difference and less life-altering (for the operator) consequence if it goes wrong.
rpcope1•38m ago
I had the same thought, it's kind of like taking the guard off a 4 1/2" grinder. Real convenient until the cutting wheel explodes or the grinder gets hung and kicks back.
thatxliner•37m ago
Maybe because there are not many resources on how to set it up, or it is just not that easy to?

Because most devs already have it running and working without a sandbox, they're tending to not doing anything "unnecessary"

soulofmischief•33m ago
It took two decades for the web to deprecate SSL for TLS and serve over HTTPS by default.
simonw•23m ago
Which agent sandbox do you recommend?
qurren•20m ago
> I'm continually bemused and astonished

I'm not. Everyone is told to get 10X the amount of shit per day done these days. Safety checks are out the window at that point.

bxk76•17m ago
Its how the chimp brain works. Its not a single system but multiple systems making predictions for different time horizons. when output doesnt align we get stories to manufacture coherence.

Plato gave us his Chariot analogy with 2 horse pulling in diff directions 3000 years ago. Today we got System 1/System 2, Elephant Rider model etc.

The human mind thanks to how its own architecture handles unpredictability in the universe will generate contadictions.

dyauspitr•1h ago
It’s not just a more proactive and diligent opus. The capabilities are significantly higher on fable. It’s not a paradigm shift, but it’s close.
UncleOxidant•44m ago
I unleashed it on a compiler codebase that I've been developing for several months now using Claude Sonnet 4.5/6, Gemini 3.1 Pro, DeepSeek V4 Pro(recent), and a bit of Qwen3.6-27B. Right away Fable found several longstanding bugs in our compiler that we hadn't found before. It found that there was a critical part of our design that needed to be mostly redesigned/rewritten and gave a very well-reasoned rationale for doing so.
rajveerb•34m ago
what sort of compiler?
conradkay•34m ago
Does low/medium effort fix it for you? Seems like Fable 5 low can outperform Opus 4.8 high/xhigh often, and uses a lot fewer tokens
sanex•9m ago
I've found the opposite. Granted I use sub agents heavily but I've had it run for hours with far fewer tokens used than when I was previously using opus4.6-8.
NiloCK•41m ago
... so the mechanic produced an invoice, itemized.

changing the CSS - $0.05

knowing which CSS to change - $30

swingboy•29m ago
overflow is CSS 101
rmunn•36m ago
At some point the subscription model is going to become unsustainable for the frontier companies to continue (we just saw that happen with GitHub Copilot), and they will move everyone to a pay-per-token model. And then everyone will suddenly discover that they can get so much more value out of locally-hosted models, and they'll be willing to pay the $50,000 (or whatever) upfront on hardware to host it. (Not most individuals, obviously. But most companies can probably afford to spend that much on hardware if they think they'll benefit long-term). That's going to put a serious crimp in the frontier companies' ability to continue as they have been.

I don't know when that will happen, but I don't think it'll be more than a decade. Maybe 3-5 years. (Though you shouldn't take my word for it, I was predicting the dotcom bubble bursting in 1998 and it lasted at least two years longer than I would have predicted).

EDIT to clarify: I don't mean "in 1998, I was predicting the dotcom bubble would collapse and I was right". I mean "I was predicting that 1998 would be the year the dotcom bubble would collapse, and I was off by at least two years".

simonw•12m ago
GitHub Copilot's challenge is that they weren't selling access to their own models, they were selling access to models from OpenAI and Anthropic which they presumably had to pay list price for (or maybe a slightly reduced rate that they negotiated).

They also had a pricing plan which they had designed pre-coding-agent, when it was rare for a single prompt to burn $10+ of tokens in an agent loop.

OpenAI and Anthropic are at least selling their own models directly, so they can discount a whole lot more since there's no-one else getting compensated in the middle.

simonw•14m ago
I updated my post to answer that, it was $12.11 at API prices (I wasn't paying those, I have a $100/month subscription): https://simonwillison.net/2026/Jun/11/fable-is-relentlessly-...
snide•36m ago
Thanks duder! It's a fun project.
simonw•15m ago
One of the big mysteries of the last few years is this: considering how serious prompt injections are as a vulnerability class, why haven't we heard more stories of them being actively exploited in the wild?

(The best one I can think of is probably that recent Instagram account takeover hack, but that was so stupid it hardly even qualifies as a prompt injection!)

Having spent a bunch of time trying to build out examples of prompt injections, my current best guess is that the leading models are actually surprisingly good at spotting them.

I've had to drop back to smaller, weaker models for demos recently - it's definitely possible to prompt inject a frontier GPT or Claude but it's frustratingly difficult. I don't have the patience to figure it out myself!

So yeah, I do think it's likely that Mythos/Fable are "safer" than other models because they're better at spotting when they're being subverted.

That certainly doesn't mean that they're safe!

CamperBob2•4m ago
To use D&D scores as an analogy, LLMs have an INT score of 20 and a WIS score of 0. Not even 1, zero.

This is not a valid assessment. You need to spend some time actually using a modern LLM. Fable 5 exhibits not only wisdom, but comes dangerously close to actual taste.