frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
576•klaussilveira•10h ago•167 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
889•xnx•16h ago•540 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
91•matheusalmeida•1d ago•20 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
18•helloplanets•4d ago•10 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
21•videotopia•4d ago•0 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
197•isitcontent•11h ago•24 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
199•dmpetrov•11h ago•91 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
307•vecti•13h ago•136 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
352•aktau•17h ago•175 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
350•ostacke•17h ago•91 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
453•todsacerdoti•19h ago•228 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
20•romes•4d ago•2 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
79•quibono•4d ago•18 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
52•kmm•4d ago•3 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
253•eljojo•13h ago•153 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
388•lstoll•17h ago•263 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
5•bikenaga•3d ago•1 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
231•i5heu•14h ago•175 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
12•neogoose•3h ago•7 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
68•phreda4•10h ago•12 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
24•gmays•6h ago•6 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
116•SerCe•7h ago•94 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
135•vmatsiiako•16h ago•59 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
43•gfortaine•8h ago•13 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
268•surprisetalk•3d ago•36 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
168•limoce•3d ago•87 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1039•cdrnsf•20h ago•431 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
60•rescrv•18h ago•22 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•2 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
88•antves•1d ago•63 comments
Open in hackernews

Superpowers: How I'm using coding agents in October 2025

https://blog.fsck.com/2025/10/09/superpowers/
435•Ch00k•3mo ago

Comments

gjm11•3mo ago
Has anyone ever seen an instance in which the automated "How" removal actually improves an article title on HN rather than just making them wrong?

(There probably are some. Most likely I notice the bad ones more than the good ones. But it does seem like I notice a lot of bad ones, and never any good ones.)

[EDITED to add:] For context, the actual article title begins "Superpowers: How I'm using ..." and it has been auto-rewritten to "Superpowers: I'm using ...", which completely changes what "Superpowers" is understood as applying to. (The actual intention: superpowers for LLM coding agents. The meaning after the change: LLM coding agents as superpowers for humans.)

add-sub-mul-div•3mo ago
I agree, I'm sure I've seen instances of where it's worked but the problem is that when it messes it up it's much more annoying than any benefit it brings when it does work. Some of us don't want to be reminded that tech is full of hubris, overconfidence, poor judgment, and failure about what can/should be abstracted and automated.
dvfjsdhgfv•3mo ago
Yeah, to the point I can recall several examples where the title stuck out as dumb on HN and only when visiting the original page it started to make sense, but not a single case where I could say the automated removal really did a good job.
bryanrasmussen•3mo ago
I've had it happen with me a few times where it was reasonable, sometimes where it was debatable, and if it was just wrong I edit it to add the How back in.
jvanderbot•3mo ago
This is so interesting but it reads like satire. I'm sure folks who love persuading and teaching and marshalling groups are going to do very well in SWEng.

According to this, we'll all be reading the feelings journals of our LLM children and scolding them for cheating on our carefully crafted exams instead of, you know, making things. We'll read psychology books, apparently.

I like reading and tinkering directly. If this is real, the field is going to leave that behind.

sunir•3mo ago
We certainly will; they can’t replace humans in most language tasks without having a human like emotional model. I have a whole therapy set of agents to debug neurotic long lived agents with memory.
jvanderbot•3mo ago
Ok, call me crazy, but I don't actually think there's any technical reason that a theoretical code generation robot needs emotions that are as fickle and difficult to manage as humans.

It's just that we designed this iteration of technology foundationally on people's fickle and emotional reddit posts among other things.

It's a designed-in limitation, and kind of a happy accident it's capable of writing code at all. And clearly carries forward a lot of baggage...

sunir•3mo ago
Maybe. I use QWAN frequently when working with the coding agents. That requires an llm equivalent of interoception to recognize when the model understanding is scrambled or “aligned with itself” which is what qwan is.
ambicapter•3mo ago
If you can find enough training data that does human-like things without have human-like qualities, we are all ears.
jvanderbot•3mo ago
It can be simultaneously the best we have, and well short of the best we want. It can be a remarkable achievement and fall short of the perceived goals.

That's fine.

Perhaps we can RL away some of this or perhaps there's something else we need. Idk, but this is the problem when engineers are the customer, designer, and target audience.

sunir•3mo ago
Quality Spock pun.
dingnuts•3mo ago
what on God's green Earth could the CEO of a no name b2b saas have a use for long running agents?

either your business isn't successful, so you're coding when you shouldn't be, or cosplaying coding with Claude, or you're lying, or you're telling us about your expensive and unproductive hobby.

How much do you spend on AI? What's your annual profit?

edit: oh cosplaying as a CEO. I see. Nice WPEngine landing page Mr AppBind.com CEO. Better have Claude fix your website! I guess that agent needs therapy...

themafia•3mo ago
I like writing software.

I hate managing people.

What are we doing?

lerp-io•3mo ago
take #73895 on how to fix ur prompt to make ur slop better.
anuramat•3mo ago
is better slop a bad thing somehow?
dvfjsdhgfv•3mo ago
Well, slop is slop, we can discuss the details but the basic thing is invariant.
anuramat•3mo ago
why reiterate the invariant?
tonyedgecombe•3mo ago
You can only regurgitate the meal so many times.
apwell23•3mo ago
yeah none them can actually prove or even explain it in words why thier own golden prompting technique is superior. its all vibes. so annoying, i want to slap these ppl lol.
lerp-io•3mo ago
for real lmao
amelius•3mo ago
It's not a superpower if everybody has that same power.
cantor_S_drug•3mo ago
Everyone is better off with mobile phones. We can solve more diverse problems faster. Similarly we can combine our diverse superpowers (as they show in kids cartoons)
Avicebron•3mo ago
I often feel these types of blogposts would be more helpful if they demonstrated someone using the tools to build something non-trivial.

Is Claude really "learning new skills" when you feed it a book, or does it present it like that because you're prompting encourages that sort of response-behavior. I feel like it has to demo Claude with the new skills and Claude without.

Maybe I'm a curmudgeon but most of these types of blogs feel like marketing pieces with the important bit is that so much is left unsaid and not shown, that it comes off like a kid trying to hype up their own work without the benefit of nuance or depth.

khaledh•3mo ago
Agreed. The methodology needed here is something like an A/B test, with quantifiable metrics that demonstrate the effectiveness of the tool. And to do it not just once, but many times under different scenarios so that it demonstrates statistical significance.

The most challenging part when working with coding agents is that they seem to do well initially on a small code base with low complexity. Once the codebase gets bigger with lots of non-trivial connections and patterns, they almost always experience tunnel vision when asked to do anything non-trivial, leading to increased tech debt.

mwigdahl•3mo ago
The problem is that you're talking about a multistep process where each step beyond the first depends on the particular path the agent starts down, along with human input that's going to vary at each step.

I made a crude first stab at an approach that at least uses similar steps and structure to compare the effectiveness of AI agents. My approach was used on a small toy problem, but one that was complex enough the agents couldn't one-shot and required error correction.

It was enough to show significant differences, but scaling this to larger projects and multiple runs would be pretty difficult.

https://mattwigdahl.substack.com/p/claude-code-vs-codex-cli-...

potatolicious•3mo ago
What you're getting at is the heart of the problem with the LLM hype train though, isn't it?

"We should have rigorous evaluations of whether or not [thing] works." seems like an incredibly obvious thought.

But in the realm of LLM-enabled use cases they're also expensive. You'd need to recruit dozens, perhaps even hundreds of developers to do this, with extensive observation and rating of the results.

So rather than actually try to measure the efficacy, we just get blog posts with cherry-picked example of "LLM does something cool". Everything is just anecdata.

This is also the biggest barrier to actual LLM adoption for many, many applications. The gap between "it does something REALLY IMPRESSIVE 40% of the time and shits the bed otherwise" and "production system" is a yawning chasm.

marcosdumay•3mo ago
It's the heart of the problem with all software engineer research. That's why we have so little reliable knowledge.

It applies to using LLMs too. I guess the one largest difference here is that LLM has few enough companies with abundant enough money pushing it to make it trivial for them to run a test like this. So the fact that they aren't doing that also says a lot.

oblio•3mo ago
> What you're getting at is the heart of the problem with the LLM hype train though, isn't it?

> "We should have rigorous evaluations of whether or not [thing] works." seems like an incredibly obvious thought.

Heh, I'd rephrase the first part to:

> What you're getting at is the heart of the problem with software development though, isn't it?

simonw•3mo ago
The UK government ran a study with thousands of developers quite recently: https://www.gov.uk/government/publications/ai-coding-assista...
b_e_n_t_o_n•3mo ago
Woah, finally something with actual metrics instead of vibes!

> Trial participants saved an average of 56 minutes a working day when using AICAs

That feels accurate to me, but again I'm just going on vibes :P

redhale•3mo ago
I don't necessarily think the conclusions are wrong, but this relies entirely on self-reported survey results to measure productivity gains. That's too easy to poke holes in, and I think studies like this are unlikely to convince real skeptics in the near term.
simonw•3mo ago
At this point it's becoming clear from threads similar to this one that quite a lot of the skeptics are actively working not to be convinced by anything.
redhale•3mo ago
Do you have a study to back that up? /s

I agree. I think there are too many resources, examples, and live streams out there for someone to credibly claim at this point that these tools have no value and are all hype. I think the nuance is in how and where you apply it, what your expectations and tolerances are, and what your working style is. They are bad at many things, but there is tremendous value to be discovered. The loudest people on both sides of this debate are typically wrong in similar ways imo.

subjectivationx•3mo ago
I am not a software engineer but I am using my own vibe coded video efx software, my own vibe coded audio synth, my own vibe coded art generator for art. These aren't software products though. No one else is ever going to use them. The output is what matters to me. Even I can see that committing LLM generated code at your software job is completely insane. The only way to get a productivity increase is to not bother understanding what the program is doing. If you need to understand what is going on then why not just type it in yourself? My productivity increase is immeasurable because I wouldn't be able to write this video player I made. I have absolutely no idea how it works. It is exactly why I am not a software engineer. Professionals claiming a productivity boost have to be doing something along the lines of not understanding what the program is doing that is proportional to the claimed productivity increase. I don't see how you can have it both ways unless someone is just that slow of a typist.
troupo•3mo ago
Before you get into the expensive part, how do you get past "non-deterministic blackbox with unknown layers in between imposed by vendors"
potatolicious•3mo ago
You can measure probabilistic systems that you can't examine! I don't want to throw the baby out with the bathwater here - before LLMs became the all-encompassing elephant in the room we did this routinely.

You absolutely can quantify the results of a chaotic black box, in the same way you can quantify the bias of a loaded die without examining its molecular structure.

claytongulick•3mo ago
> The methodology needed here is something like an A/B test, with quantifiable metrics that demonstrate the effectiveness of the tool. And to do it not just once, but many times under different scenarios so that it demonstrates statistical significance.

If that's what we need to do, don't we already have the answer to the question?

coolKid721•3mo ago
Yeah I was reading this seeing if there was something he'd actually show that would be useful, what pain point he is solving, but it's just slop.
simonw•3mo ago
Here's one from today: https://mitchellh.com/writing/non-trivial-vibing
j_bum•3mo ago
This was a fun read.

I’ve similarly been using spec.md and running to-do.md files that capture detailed descriptions of the problems and their scoped history. I mark each of my to-do’s with informational tags: [BUG], [FEAT], etc.

I point the LLM to the exact to-do (or section of to-do’s) with the spec.md in memory and let it work.

This has been working very well for me.

lcnPylGDnU4H9OF•3mo ago
Do you mind linking to example spec/to-do files?
SteveJS•3mo ago
Here is a (3 month old) repo where i did something like that and all the tasks are checked into the linear git history — https://github.com/KnowSeams/KnowSeams
j_bum•3mo ago
Sure thing. Here is an example set of the agent/spec/to-do files for a hobby project I'm actively working on.

https://gist.github.com/JacobBumgarner/d29b660cb81a227885acc...

lcnPylGDnU4H9OF•3mo ago
Thanks!
j_bum•3mo ago
No problem! I’d love to hear any approach you’ve taken as well.
nightski•3mo ago
Even though the author refers to it as "non-trivial", and I can see why that conclusion is made, I would argue it is in fact trivial. There's very little domain specific knowledge needed, this is purely a technical exercise integrating with existing libraries for which there is ample documentation online. In addition, it is a relatively isolated feature in the app.

On top of that, it doesn't sound enjoyable. Anti slop sessions? Seriously?

Lastly, the largest problem I have with LLMs is that they are seemingly incapable of stopping to ask clarifying questions. This is because they do not have a true model of what is going on. Instead they truly are next token generators. A software engineer would never just slop out an entire feature based on the first discussion with a stakeholder and then expect the stakeholder to continuously refine their statement until the right thing is slopped out. That's just not how it works and it makes very little sense.

kannanvijayan•3mo ago
I've wondered about exposing this "asking clarifying questions" as a tool the AI could use. I'm not building AI tooling so I haven't done this - but what if you added an MCP endpoint whose description was "treat this endpoint as an oracle that will answer questions and clarify intent where necessary" (paraphrased), and have that tool just wire back to a user prompt.

If asking clarifying questions is plausible output text for LLMs, this may work effectively.

simonw•3mo ago
I think the asking clarifying questions thing is solved already. Tell a coding agent to "ask clarifying questions" and watch what it does!
nightski•3mo ago
Obviously if you instruct the autocomplete engine to fill in questions it will. That's not the point. The LLM has no model of the problem it is trying to solve, nor does it attempt to understand the problem better. It is merely regurgitating. This can be extremely useful. But it is very limiting when it comes to using as an agent to write code.
wrs•3mo ago
You can work with the LLM to write down a model for the code (aka a design document) that it can then repeatedly ingest into the context before writing new code. That what “plan mode” is for. The technique of maintaining a design document and a plan/progress document that get updated after each change seems to make a big difference in keeping the LLM on track. (Which makes sense…exactly the same thing works for human team mambers too.)
habinero•3mo ago
Every time I hear someone say something like this, I think of the pigeons in the Skinner box who developed quirky superstitious behavior when pellets were dispensed at random.
troupo•3mo ago
> that it can then repeatedly ingest into the context

1. Context isn't infinite

2. Both Claude and OpenAI get increasingly dumb after 30-50% of context had been filled

wrs•3mo ago
Not sure how that's relevant... I haven't seen many design documents of infinite size.
troupo•3mo ago
"Infinite" is a handy shortcut for "large enough".

Even the "million token context window" becomes useless once it's filled to 30-50% and the model starts "forgetting" useful things like existing components, utility functions, AGENTS.md instructions etc.

Even a junior programmer can search and remember instructions and parts of the codebase. All current AI tools have to be reminded to recreate the world from scratch every time, and promptly forget random parts of it.

subjectivationx•3mo ago
I think at some point we will stop pretending we have real AI. We have a breakthrough in natural language processing but LLMs are much closer to Microsoft Word than something as fantastical as "AGI". We don't blame Microsoft Word for not having a model of what is being typed in. It would be great if Microsoft Word could model the world and just do all the work for us but it is a science fiction fantasy. To me, LLMs in practice are largely massively compute inefficient search engines plus really good language disambiguation. Useful, but we have actually made no progress at all towards "real" AI. This is especially obvious if you ditch "AI" and call it artificial understanding. We have nothing.
danielbln•3mo ago
I've added "amcq means ask me clarifying questions" to my global Claude.md so I can spam "amcq" at various points in time, to great avail.
simonw•3mo ago
The hardest problem in computer science in 2025 is presenting an example of AI-assisted programming that somebody won't call "trivial".
nightski•3mo ago
If all I did was call it trivial that would be a fair critique. But it was followed up with a lot more justification than that.
simonw•3mo ago
Here's the PR. It touched 21 files. https://github.com/ghostty-org/ghostty/pull/9116/files

If that's your idea of trivial then you and I have very different standards in terms of what's a trivial change and what isn't.

groby_b•3mo ago
It's trivial in the sense that a lot of the work isn't high cognitive load. But... that's exactly the point of LLMs. It takes the noise away so you can focus on high-impact outcomes.

Yes, the core of that pull requests is an hour or two of thinking, the rest is ancillary noise. The LLM took away the need for the noise.

If your definition of trivial is signal/noise ratio, then, sure, relatively little signal in a lot of noise. If your definition of "trivial" hinges on total complexity over time, then this kicks the pants of manual writing.

I'd assume OP did the classic senior engineer stick of "I can understand the core idea quickly, therefore it can't be hard". Whereas Mitchel did the heavy lifting of actually shipping the "not hard" idea - still understanding the core idea quickly, and then not getting bogged down in unnecessary details.

That's the beauty of LLMs - it turns the dream of "I could write that in a weekend" into actually reality, where it before was always empty bluster.

antonvs•3mo ago
> A software engineer would never just slop out an entire feature based on the first discussion with a stakeholder and then expect the stakeholder to continuously refine their statement until the right thing is slopped out. That's just not how it works and it makes very little sense.

Didn’t you just describe Agile?

Retric•3mo ago
Who hurt you?

Sorry couldn’t resist. Agile’s point was getting feedback during the process rather than after something is complete enough to be shipped thus minimizing risk and avoiding wasted effort.

Instead people are splitting up major projects into tiny shippable features and calling that agile while missing the point.

rkomorn•3mo ago
I've never seen a working scrum/agile/sprint/whatever product/project management system and I'm convinced it's because I've just never seen an actual implementation of one.

"Splitting up major projects into tiny shippable features and calling that agile" feels like a much more accurate description of what I've experienced.

I wish I'd gotten to see the real thing(s) so I could at least have an informed opinion.

Retric•3mo ago
Yea, I think scrum etc is largely a failure in practice.

The manager for the only team I think actually checked all the agile boxes had a UI background so she thought in terms of mock-ups, backend, and polishing as different tasks and was constantly getting client feedback between each stage. That specific approach isn’t universal, the feedback as part of the process definitely should be though.

What was a little surreal is the pace felt slow day to day but we were getting a lot done and it looked extremely polished while being essentially bug free at the end. An experienced team avoiding heavy processes, technical debt, and wasted effort goes a long way.

Balinares•3mo ago
I've seen the real thing and it's pretty much splitting major projects into tiny shippable bits. Picking which bits and making it so they steadily add up to the desired outcomes is the hard part.
habinero•3mo ago
People misunderstand the system, I think. It's not holy writ, you take the parts of it that work for your team and ditch the rest. Iterate as you go.

The failure modes I've personally seen is an organization that isn't interested in cooperating or the person running the show is more interested in process than people. But I'd say those teams would struggle no matter what.

rkomorn•3mo ago
I put a lot of the responsibility for the PMing failures I've seen on the engineering side not caring to invest anything at all into the relationship.

Ultimately, I think it's up to the engineering side to do its best to leverage the process for better results, and I've seen very little of that (and it's of course always been the PM side's fault).

And you're right: use what works for you. I just haven't seen anything that felt like it actually worked. Maybe one problem is people iterating so fast/often they don't actually know why it's not working.

antonvs•3mo ago
Agile’s point was to get feedback based on actual demoable functionality, and iterate on that. If you ignore the “slop” pejorative, in the context of LLMs, what I quoted seems to fit the intent of Agile.
Retric•3mo ago
There’s generally a big gap between the minimum you can demo and an actual feature.
antonvs•3mo ago
If you want to use an LLM to generate a minimal demoable increment, you can. The comment I replied to mentioned "feature", but that's a choice based on how you direct the LLM. On the other hand, LLM capabilities may change the optimal workflow somewhat.

Either way, the ability to produce "working software" (as the manifesto puts it) in "frequent" iterations (often just seconds with an LLM!) and iterate on feedback is core to Agile.

qsort•3mo ago
> Important: there is a lot of human coding, too.

I'm not highlighting this to gloat or to prove a point. If anything in the past I have underestimated how big LLMs were going to be. Anyone so inclined can take the chance to point and laugh at how stupid and wrong that was. Done? Great.

I don't think I've been intentionally avoiding coding assistants and as a matter of fact I have been using Claude Code since the literal day it first previewed, and yet it doesn't feel, not even one bit, that you can take your hands off the wheel. Many are acting as if writing any code manually means "you're holding it wrong", which I feel it's just not true.

simonw•3mo ago
Yeah, my current opinion on this is that AI tools make development harder work. You can get big productivity boosts out of them but you have to be working at the top of your game - I often find I'm mentally exhausted after just a couple of hours.
jstummbillig•3mo ago
Considering the last 2 years, has it become harder or easier?
simonw•3mo ago
Definitely harder.

A year ago I was using GitHub Copilot autocomplete in VS Code and occasionally asking ChatGPT or Claude to help write me a short function or two.

Today I have Claude Code and Codex CLI and Codex Web running, often in parallel, hunting down and resolving bugs and proposing system designs and collaborating with me on detailed specs and then turning those specs into working code with passing tests.

The cognitive overhead today is far higher than it was a year ago.

dingdingdang•3mo ago
Also better and faster though!! It's close to a Daft Punk type situation.
sawmurai•3mo ago
I have a similar experience. It feels like riding your bike in a higher gear - you can go faster but it will take more effort and you need the potential (stronger legs) to make use of it
specproc•3mo ago
It's more like shifting from a normal to an electric bike.

You can go further and faster, but you can get to a point where you're out of juice miles from home, and getting back is a chuffing nightmare.

Also, you discover that you're putting on weight and not getting that same buzz you got on your old pushbike.

truetraveller•3mo ago
Hey, that's a great analogy, 10/10! This explains in a few words what an entire article might explain.
dotinvoke•3mo ago
My experience with AI tools is the opposite. The biggest energy thieves for me are configuration issues, library quirks, or trivial mistakes that are hard to spot. With AI I can often just bulldoze past those things and spend more time on tangible results.

When using it for code or architecture or design, I’m always watching for signs that it is going off the rails. Then I usually write code myself for a while, to keep the structure and key details of whatever I’m doing correct.

troupo•3mo ago
For me, LLMs always, without fail get important details wrong.

- incessantly duplicating already existing functionality: utility functions, UI components etc.

- skipping required parameters like passing current user/actor to DB-related functions

- completely ignoring large and small chunks of existing UI and UI-related functionality like layouts or existing styles

- using ad-hoc DB queries or even iterating over full datasets in memory instead of setting up proper DB queries

And so on and so forth.

YYMV of course depending on language and project

simonw•3mo ago
Sounds to me like you'd benefit from providing detailed instructions to LLMs about how they should avoid duplicating functionality (which means documenting the functionality they should be aware of), what kind of parameters are always required, setting up "proper DB queries" etc.

... which is exactly the kind of thing this new skills mechanism is designed to solve.

troupo•3mo ago
> Sounds to me like you'd benefit from providing detailed instructions to LLMs about how they should avoid duplicating functionality

That they routinely ignore.

> which means documenting the functionality they should be aware of

Which means spending inordinate amounts of time writing down about every single function and component and css and style which can otherwise be easily discovered by just searching. Or by looking at adjacent files.

> which is exactly the kind of thing this new skills mechanism is designed to solve.

I tried it yesterday. It immediately duplicated functionality, ignored existing styles and components, and created ad-hoc queries. It did feel like there were fewer times when it did that, but it's hard to quantify.

james_marks•3mo ago
100%. It’s like managing an employee that always turns their work in 30 seconds later; you never get a break.

I also have to remember all of the new code that’s coming together, and keep it from re-inventing other parts of the codebase, etc.

More productive, but hard work.

Fuzzwah•3mo ago
Copilot is the perfect name.
truetraveller•3mo ago
Woah, that's huge coming from you. This comment itself is worth an article. Do it. Call it "AI tools make development harder work".

P.s. always thought you were one of those irrational AI bros. Later, found that you were super reasonable. That's the way it should be. And thank you!

oblio•3mo ago
LLMs are autonomous driving level 2.
Pannoniae•3mo ago
In fact, I've been writing more code myself since these tools exist - maybe I'm not a real developer but in the past I might have tried to either find a library online or try to find something on the internet to copypaste and adapt, nowadays I give it a shot myself with Claude.

For context, I mainly do game development so I'm viewing it through that lens - but I find it easier to debug something bad than to write it from scratch. It's more intensive than doing it yourself but probably more productive too.

scuff3d•3mo ago
> Many are acting as if writing any code manually means "you're holding it wrong", which I feel it's just not true.

It's funny because not far below this comment there is someone doing literally this.

spankibalt•3mo ago
> "Maybe I'm a curmudgeon but most of these types of blogs feel like marketing pieces with the important bit is that so much is left unsaid and not shown, that it comes off like a kid trying to hype up their own work without the benefit of nuance or depth."

C'mon, such self-congratulatory "Look at My Potency: How I'm using Nicknack.exe" fluffies always were and always will be a staple of the IT industry.

lcnPylGDnU4H9OF•3mo ago
Still, the best such pieces are detailed and explanatory.
causal•3mo ago
Using LLMs for coding complex projects at scale over a long time is really challenging! This is partly because defining requirements alone is much more challenging than most people want to believe. LLMs accelerate any move in the wrong direction.
dexwiz•3mo ago
My analogy is LLMs are a gas pedal. Makes you go fast, but you still have to know when to turn.
sreekanth850•3mo ago
True
sreekanth850•3mo ago
One should know theend to end design and architecture. Should stop llm when adding complex fancy things.
SteveJS•3mo ago
Having the llm write the spec/workunit from a conversation works well. Exploring a problem space with a (good) coding agent is fantastic.

However for complex projects IMO one must read what was written by the llm … every actual word.

When it ‘got away’ from me, in each case I left something in the llm written markdown that I should have removed.

99% “I can ask for that later” and 1% “that’s a good idea i hadn’t considered” might be the right ratio when reading an llm generated plan/spec/workunit.

Breaking work into single context passes … 50-60k tokens in sonnet 4.5 has had typically fantastic results for me.

My side project is using lean 4 and a carelessly left in ‘validate’ rather than ‘verify’ lead down a hilariously complicated path equivalent to matching an output against a known string.

I recovered, but it wasn’t obvious to me that was happening. I however would not be able to write lean proofs myself, so diagnosing the problem and fixing it is a small price to be able to mechanically verify part of my software is correct.

danielmarkbruce•3mo ago
Why not just use claude code and come to your own conclusion?
jackblemming•3mo ago
Seems cute, but ultimately not very valuable without benchmarks or some kind of evaluation. For all I know, this could make Claude worse.
jelling•3mo ago
Same. We've all fooled ourselves into believing that an LLM / stochastic process was finally solved based on a good result. But the sample size is always to low to be meaningful.
anuramat•3mo ago
even if it works as described, I'm assuming it's extremely model dependent (eg book prerequisites), so you'd have to re-run this for every model you use, this is basically poor man's finetuning;

maybe explicit support from providers would make it feasible?

tobbe2064•3mo ago
What's the cost of running with agents like this?
dbbk•3mo ago
Claude Max is fixed cost
tobbe2064•3mo ago
Is it doable with just pro?
redhale•3mo ago
My guess is: absolutely not, at least not for more than a few minutes. Subagents chew through tokens at a very high rate, and this system makes heavy use of subagents.
fnicfnac•3mo ago
"20X the usage of pro" still sounds like quotas where the hammer could fall as it becomes less of an experiment for a limited number of power users..

The costs of self hosting some reasonable size models for a development group of various sizes is what I would want to know before investing in the skills to do a high usage style that might be being mostly bankrolled by investors for now.

jmull•3mo ago
> <EXTREMELY_IMPORTANT>…*RIGHT NOW, go read…

I don’t like the looks of that. If I used this, how soon before those instructions would be in conflict with my actual priorities?

Not everything can be the first law.

apwell23•3mo ago
don't llm tell you not to give them instructions like that these days
therealdrag0•3mo ago
Seems like maintaining a bashrc file. Sometimes you have to go tweak it.
simonw•3mo ago
I can't recommend this post strongly enough. The way Jesse is using these tools is wildly more ambitious than most other people.

Spend some time digging around in his https://github.com/obra/Superpowers repo.

I wrote some notes on this last night: https://simonwillison.net/2025/Oct/10/superpowers/

csar•3mo ago
I’m curious how you think this compares to the Research -> Plan -> Implement method and prompts from the “Advanced Context Engineering from Agents” video when it comes to actual coding performance on large codebases. I think picking up skills is useful for broadening agents abilities, but I’m not sure I’d that’s the right thing for actual development.

The packaged collection is very cool and so is the idea of automatically adding new abilities, but I’m not fully convinced that this concept of skills is that much better than having custom commands+sub-agents. I’ll have to play around with it these next few days and compare.

ehsanu1•3mo ago
Using Research->Plan->Implement flow is orthogonal, though I notice parts of those do exist as skills too. But you sometimes need to do other things too, e.g. debugging in the course of implementing or specific techniquws to improve brainstorming/researching.

Some of these skills are probably better as programmed workflows that the LLM is forced to go through to improve reliability/consistency, that's what I've found in my own agents, rather than using English to guide the LLM and trusting it to follow the prescribed set of steps needed. Some mix of LLMs (choosing skills, executing the fuzzy parts of them) and just plain code (orchestration of skills) seems like the best bet to me and what I'm pursuing.

drivebyhooting•3mo ago
Orthogonal means there should not be any overlap.
0x696C6961•3mo ago
Yeah, but you knew what he meant.
troupo•3mo ago
This looks like usage rules in Elixir, but for agent behaviors, and currently specifically for Claude: https://hexdocs.pm/usage_rules/readme.html
smrtinsert•3mo ago
Curious what you think of sub agents, don't they still consume a massive amount of tokens compared to simply running in main context? I'm skeptical of any process that starts massively delegating to sub agents. I'm on Pro and don't think its worth upgrading to 200 a month just to not pollute main context.
redhale•3mo ago
In my opinion, subagents (or more generally, "agents as tools" as a pattern) are an order-of-magnitude level feature. Soon every CLI agent will have them as a first-class feature (you can get them via custom scripting right now with and CLI agent, albeit less ergonomically).

The ability to isolate context-noisy subtasks (like agentically searching through a large codebase by grepping through dozens of irrelevant files to find the one you actually need) unlocks much longer-running loops, and therefore much more complex tasks.

And you don't need a system this complicated to take advantage of it. Literally just a simple "codebase-searcher" agent (and Claude can vibe the agent definition for you) is enough to see the benefit first-hand. Once you see it, if you're like me, you will see opportunities for subagents everywhere.

simonw•3mo ago
This is a great answer IMO.
simonw•3mo ago
I think they're worthwhile only as a token context management tool - to complete side quests without using up tokens in your main agent loop.

Using them in a way that doesn't waste tokens is something I haven't fully figured out out yet!

gner75•3mo ago
I'm not sure I get this. If anything, they'll consume less tokens, because their context will possibly contain a subset of the original single agent prompt, and they only need to see a subset of the original single agent history.

What am I missing?

simonw•3mo ago
Take a look at my example here - having a bunch of sub-agents perform a task consumed 50,000+ tokens each across 5 subtasks, because each one had to consume duplicate information. https://simonwillison.net/2025/Oct/11/sub-agents/
gner75•3mo ago
But that's down to the way Claude Code has implemented it? If I code this myself I could engineer so that the subagents don't have overlapping context with the orchestrator.

Also, memory itself can be a tool the subagent calls to retrieve only the stuff it needs.

spprashant•3mo ago
I am not ashamed to admit this whole agentic coding movement has moved beyond me.

Not only do I have know everything about the code, data and domain, but now I need to understand this whole AI system which is a meta skill of its own.

I fear I may never be able catch up till someone comes along and simplifies it for pleb consumption.

gdulli•3mo ago
It's also possible to put in enough hours of real coding to get to the point where coding really isn't that hard anymore, at least not hard enough to justify switching from those stable/solid fundamental skills to a constantly revolving ecosystem of ephemeral tools, models, model versions, best practices, lessons from trial and error, etc. Then you could bypass all of this distraction.

Admittedly that stance is easiest to take if you were old enough, experienced enough already by the time this era hit.

paweladamczuk•3mo ago
"There exist developers whose performance cannot be boosted by an LLM" is a really strong statement.
gdulli•3mo ago
The point is that it takes significant time and attention to keep up with the treadmill of constantly learning the new tool/model/framework of the month, so there's significant opportunity cost. I have continued putting 100% of my attention on the direct problems I'm solving.

I don't see the coding as the hard or critical part of my work, so I don't put effort into accelerating or delegating that part.

_se•3mo ago
Not really. It's on the people asserting the positive (that LLMs do improve productivity for sufficiently experienced engineers) to prove it. In the absence of proof, the null hypothesis is the default.
evanmoran•3mo ago
To give you a process that might help:

I’ve found you have to use Claude Code to do something small. And as you do it iterate on the CLAUDE.md input prompt to refine what it does by default. As it doesn't do it your way, change it to see if you can fix how it works. The agent is then equivalent to calling chatgpt / sonnet 1000 times a hour. So these refinements (skills in the post are a meta approach) are all about how to tune the workflow to be more accurate for your project and fit your mental model. So as you tune the md file you’ll start to feel what is possible and understand agent capabilities much better.

So short story you have to try it, but long story its the iteration of the meta prompt approach that teaches you whats possible.

philbo•3mo ago
I think this and other recent posts here hugely overcomplicate matters. I notice none of them provides an A/B test for each item of complexity they introduce, there's just a handwavy "this has proved to work over time".

I've found that a single CLAUDE.md does really well at guiding it how I want it to behave. For me that's making it take small steps and stop to ask me questions frequently, so it's more like we're pairing than I'm sending it off solo to work on a task. I'm sure that's not to everyone's taste but it works for me (and I say this as someone who was an agent-sceptic until quite recently).

Fwiw my ~/.claude/CLAUDE.md is 2.2K / 49 lines.

d4rkp4ttern•3mo ago
Indeed Anthropic’s best practices suggest keeping the CLAUDE.md relatively small.
lcnPylGDnU4H9OF•3mo ago
I haven't really done much of it but my plan is just to practice. This seems like a powerful thing to start with.
cruffle_duffle•3mo ago
I’ve personally decided that cursor agent mode is good enough. A single foreground instance of cursor doing its thing is plenty enough to babysit. Based upon that experience I am highly highly skeptical people are actually creating things of value with these multi-agent-running-in-the-background setups. Way to much babysitting and honestly writing docs and specs for them is more work than just writing parts of the code myself and letting the LLM do the tedious bits like finishing what I started.

No matter what you are told, there is no silver bullet. Precisely defining the problem is always the hard part. And the best way to precisely define a problem and its solution is code.

I’ll let other people fight swarms of bots building… well who knows what. Maybe someday it will deliver useful stuff, but I’m highly skeptical.

hoechst•3mo ago
Much of it is just "put this magic string before your prompt to make the LLM 10x better" voodoo, similar to the SEO voodoo common in the 2000s.

just remember that it works the same for everyone: you input text, magic happens, text comes out.

if you can properly explain a software engineering problem in plain language, you're an expert in using LLMs. everything on top of that people experimenting or trying to build the next big thing.

benhurmarcel•3mo ago
> till someone comes along and simplifies it for pleb consumption

Just give it a few months. If some technics really work, it’ll get streamlined.

daemontus•3mo ago
Maybe this is a naive question, but how are "skills" different from just adding a bunch od examples of good/bad behavior into the prompt? As far as I can tell, each skill file is a bunch of good/bad examples of something. Is the difference that the model chooses when to load a certain skill into context?
nrjames•3mo ago
I think it just gives you the ability to easily do that with slash command, like using "/brainstorm database schema" or something instead of needing to define what "brainstorm" means each time you want to do it.
hackernewds•3mo ago
what you are suggesting is 1-shot, 2-shot, 5-shot etc prompting which is so effective that it's how benchmarks were presented for a while
simonw•3mo ago
I think that's one of the key things: skills don't take up any of the model context until the model actively seeks out and uses them.

Jesse on Bluesky: https://bsky.app/profile/s.ly/post/3m2srmkergc2p

> The core of it is VERY token light. It pulls in one doc of fewer than 2k tokens. As it needs bits of the process, it runs a shell script to search for them. The long end to end chat for the planning and implementation process for that todo list app was 100k tokens.

> It uses subagents to manage token-heavy stuff, including all the actual implementation.

tcdent•3mo ago
This style of prompting, where you set up a dire scenario in order to try to evoke some "emotional" response from the agent, is already dated. At some point, putting words like IMPORTANT in all uppercase had some measurable impact, but at the present time, models just follow instructions.

Save yourself the experience of having to write and maintain prompts like this.

bcoates•3mo ago
Also the persuasion paper he links isn't at all about what he's talking about.

That paper is about using persuasion prompts to overcome trained in "safety" refusals, not to improve prompt conformance.

danshapiro•3mo ago
Co-Author of the paper here. We don't know exactly why modern llms don't want to call you a jerk, or for that matter why persuasive techniques convince them otherwise. it's not a hard line like many of the guardrails. That said, I talked to Jesse about this, and I strongly suspect the same techniques will work for prompt conformance when the topic is something other than name calling.
diamond559•3mo ago
It's bc they are programmed to be agreeable and friendly so that you'll keep using them.
make3•3mo ago
isn't that just instruction fine tuning and rlhf inducing style & deference? why is that surprising
kasey_junk•3mo ago
What’s irritating is that the llms haven’t learned this as bout themselves yet. If you ask an llm to improve its instructions those sort of improvements are what it will suggest.

It is the thing I find most irritating about working with llms and agents. They seem forever a generation behind in capabilities that are self referential.

danielbln•3mo ago
LLMs will also happily put time estimates on work packages that are based on ore-LLM turn around times.

"Phase 2 will take about one week"

No, Claude, it won't, because you you and I will bang this thing out in a few hours.

mceachen•3mo ago
"Refrain from including estimated task completion times." has been in my ~/.claude/CLAUDE.md for a while. It helps.
no-name-here•3mo ago
Do such instructions take up a tiny bit more attention/context from LLMs, and consequentially is it better to leave it off and just ignore such output?
mceachen•3mo ago
I have to balance this with what I know about my reptile brain. It’s distracting to me when Claude declares that I’m “absolutely right!” or making a “brilliant insight,” so it’s worth it to me to spend the couple context tokens and tell them to avoid these cliches.

(The latest Claude has a `/context` command that’s great at measuring this stuff btw)

conorcleary•3mo ago
Comments like yours on posts like these by humans like us will create a philosophical lens out of the ether that future LLMs will harvest for free and then paywall.
jstummbillig•3mo ago
How are skills different from tools? Looks like another layer of abstraction. What for?
cynicalsecurity•3mo ago
Superpower: AI slop.
echelon•3mo ago
I'm sure the horse whip manufacturers had similar things to say about steam powered horses. We just don't think about them much anymore.

The whole world is changing around us and nothing is secure. I would not gamble that the market for our engineering careers is safe with so much disruption happening.

Tools like Lovable are going to put lots of pressure on technical web designers.

Business processes may conform to the new shape and channels for information delivery, causing more consolidation and less duplication.

Or perhaps the barrier to entry for new engineers, in a worldwide marketplace, lowers dramatically. We have accessible new tools to teach, new tools to translate, new tools to coordinate...

And that's just the bear case where nothing improves from what we have today.

yoyohello13•3mo ago
Nice try Jensen.
4b11b4•3mo ago
I'm not sure exactly what I just read...

Is this just someone who has tingly feelings about Claude reiterating stuff back to them? cuz that's what an LLM does/can do

intended•3mo ago
This isnt science, or engineering.

This is voodoo.

It likely works - but knowing that YAGNI is a thing, means at some level you are invoking a cultural touchstone for a very specific group of humans.

Edit -

I dug into the superpowers and skills for a bit. Definitely learned from it.

There’s stuff that doesn’t make sense to me on a conceptual basis. For example in the skill to preserve productive tensions. There’s a part that goes :

> The trade-off is real and won't disappear with clever engineering

There’s no dimension for “valid” or prediction for tradeoff.

I can guess that if the preceding context already outlines tradeoffs clearly, or somehow encodes that there is no clever solution that threads the needle - then this section can work.

Just imagining what dimensions must be encoding some of this suggests that it’s … it won’t work for situations where the example wasn’t already encoded in the training. (Not sure how to phrase it)

clusterhacks•3mo ago
> This isnt science, or engineering. > This is voodoo.

I was struggling to find the exact reason this type of article bugs me so much, and I think "voodoo" is precisely the correct phrase to sum up my feelings.

I don't mean that as a judgement on the utility of LLMs or that reading about what different users have tried out to increase that utility isn't valuable. But if someone asked me how to most effectively get started with coding agents, my instinct is to answer (a) carefully and (b) probably every approach works somewhat.

theptip•3mo ago
> some of the ones I've played with come from telling Claude "Here's my copy of programming book. Please read the book and pull out reusable skills that weren't obvious to you before you started reading

This is actually a really cool idea. I think a lot of the good scaffolding right now is things like “use TDD” bit if you link citations to the book, then it can perhaps extract more relevant wisdom and context (just like I would by reading the book), weather than using the generic averaged interpretation of TDD derived from the internet.

I do like the idea of giving your Claude a reading list and some spare tokens on the weekend where you’re not working, and having it explore new ideas and techniques to bring back to your common CLAUDE.md.

zahlman•3mo ago
> It also bakes in the brainstorm -> plan -> implement workflow I've already written about. The biggest change is that you no longer need to run a command or paste in a prompt. If Claude thinks you're trying to start a project or task, it should default into talking through a plan with you before it starts down the path of implementation.

... So, we're refactoring the process of prompting?

> As Claude and I build new skills, one of the things I ask it to do is to "test" the skills on a set of subagents to ensure that the skills were comprehensible, complete, and that the subagents would comply with them. (Claude now thinks of this as TDD for skills and uses its RED/GREEN TDD skill as part of the skill creation skill.)

> The first time we played this game, Claude told me that the subagents had gotten a perfect score. After a bit of prodding, I discovered that Claude was quizzing the subagents like they were on a gameshow. This was less than useful. I asked to switch to realistic scenarios that put pressure on the agents, to better simulate what they might actually do.

... and debugging it?

... How many other basic techniques of SWEng will be rediscovered for the English programming language?

hoechst•3mo ago
documents like https://github.com/obra/superpowers/blob/main/skills/testing... are very confusing to read as a human. "skills" in this project generally don't seem to follow set format and just look like what you would get when prompting an LLM to "write a markdown doc that step by step describes how to do X" (which is what actually happened according to the blog post).

idk, but if you already assume that the LLM knows what TDD is (it probably ingested ~100 whole books about it), why are we feeding a short (and imo confusing) version of that back to it before the actual prompt?

i feel like a lot of projects like this that are supposed to give LLMs "superpowers" or whatever by prompt engineering are operating on the wrong assumption that LLMs are self-learning and can be made 10x smarter just by adding a bit of magic text that the LLM itself produced before the actual prompt.

ofc context matters and if i have a repetitive tasks, i write down my constraints and requirements and paste that in before every prompt that fits this task. but that's just part of the specific context of what i'm trying to do. it's not giving the LLM superpowers, it's just providing context.

i've read a few posts like this now, but what i am always missing is actual examples of how it produces objectively better results compared to just prompting without the whole "you have skill X" thing.

Footprint0521•3mo ago
I fully agree. I’ve been running codex with GPT Pro (5o-codex-high) for a few weeks now, and it really just boils down to context.

I’ve found the most helpful things for me is just voice to Whisper to LLMs, managing token usage effectively and restarting chats when necessary, and giving it quantified ways to check when its work is done (say, AI-Unit-Tests with apis or playwright tests.) Also, every file I own is markdown haha.

And obviously having different AI chats for specialized tasks (the way the math works on these models makes this have much better results!)

All of this has allowed me to still be in the PM role like he said, but without burning down a needless forest on having it reevaluate things in its training set lol. But why would we go back to vendor lock in with Claude? Not to mention how much more powerful 5o-codex-high is, it’s not even close

The good thing about what he said is getting AI to work with AI, I have found this to be incredibly useful in promoting, and segmenting out roles

redhale•3mo ago
Everything is just context, of course. Every time I see a blog post on "the nine types of agentic memory" or some such I have a similar reaction.

I would say that systems like this are about getting the agent to correctly choose the precisely correct context snippet for the exact subtask it's doing at a given point within a larger workflow. Obviously you could also do that manually, but that doesn't scale to running many agents in parallel, or running automomously for longer durations.

gdown•3mo ago
Especially with some of the more generic skills like https://github.com/obra/superpowers-skills/blob/main/skills/... and https://github.com/obra/superpowers-skills/blob/main/skills/...: it seems like they're general enough that they'd be better off in the main prompt. I'd be interested to see when claude actually decides to pull them in
tinodb•3mo ago
Also the format seems quite badly written. Ie. those “quick references” are actually examples. Several generic sentences are repeated multiple times in different wording across sections, etc.
d_sem•3mo ago
This article left me wishing it was "How I'm using coding agents to do <x> task better"

I've been exploring AI for two years now. It's certainly upgraded itself from the toy classification to a basic utility. However, I increasingly run into its limitations and find reverting to pre-LLM ways of working more robust, faster, and more mentally sustainable.

Does someone have concrete examples of integrating LLM in a workflow that pushes state-of-the-art development practices & value creation further?

jvanderbot•3mo ago
My impression is we're still in the tinkering phase. The metrics are coming.
aydyn•3mo ago
What metrics? We could never objectively measure productivity except in the macro economic sense, so what makes you think we'll be able to now?
jvanderbot•3mo ago
There are best practices of a kind, and well known org structures that work for building software, at least as much as anything can. We'll have some best practice experience with LLM agents soon enough.
simonw•3mo ago
Mitchell's post from this morning: https://mitchellh.com/writing/non-trivial-vibing
3eb7988a1663•3mo ago
I am only on the first page and saw this blurb and was immediately annoyed.

  @/Users/jesse/.claude/plugins/cache/Superpowers/...
The XDG spec has been out for decades now. Why are new applications still polluting my HOME? Also seems weird that real data would be put under a cache/ location, but whatever.
simonw•3mo ago
It's in the cache location because it's a copy of a plugin that was installed from a GitHub repository, so that's not the original point of truth for that file.
wbradley•3mo ago
I think the point is that ~/.claude should be dispersed among ~/.config/claude, ~/.local/state/claude, etc

I agree with this, it’s frustrating that in 2025 apps are still polluting my home dir.

kibwen•3mo ago
It's one thing to wish that apps would put their data anywhere except dumping it in your home dir, but this is exactly why I hate the XDG spec. I want all data for a program--be it the configuration or the cache or the binary itself--to be in a single directory such that 1) "uninstalling" the program, completely and in isolation, is nothing more than just deleting that single directory, and 2) any program not doing arbitrary file I/O can entirely function while having access to only its installation directory, and nothing else on the filesystem.
0x6c6f6c•3mo ago
This approach couples together everything though, in such a way there's no standard manner of wiping cache but not your app, configuration, etc.

XDG may not be perfect but wiping related data for apps following it is straightforward. There are a few directories to delete instead of 1, but still consistently structured at least.

lcnPylGDnU4H9OF•3mo ago
The "How to create skills" link is broken. This is the new location: https://github.com/obra/superpowers/blob/personal-superpower...
yoyohello13•3mo ago
The post reads like the someone throwing bones and reading their fortune. That part where Claude did its own journaling was so cringe it was hilarious. The tone of the journal entry was exactly like the blog author, which suggests to me Claude is reflecting back what the author wants to hear. I feel like Jesse is consumed in a tornado of llm sycophancy.
saaaaaam•3mo ago
Claude has never once said “oh shit” or “holy crap” to me. I must be doing something horribly wrong.
titanomachy•3mo ago
You need to read more books on influencing people. /s
preommr•3mo ago
> It made sense to me that the persuasion principles I learned in Robert Cialdini's Influence would work when applied to LLMs. And I was pleased that they did.

No, no. Stop.

What is this? What're we doing here?

This goes past developping with AI into something completely different.

Just because AI coding is a radical shift doesn't mean everything has changed. There needs to be some semblance of structure and design. Instead what we're getting is straight up vodoo nonsense.

imiric•3mo ago
> Instead what we're getting is straight up vodoo nonsense.

It always has been. Starting with the term "AI" itself.

Articles like these read the same way to me as any OpenAI announcement from the past 5 years. A bunch of technical mumbo jumbo laced with hyperbole, grand promises of how the technology is changing the world, and similar platitudes. I've learned to filter most of it out.

Occasionally I'll stumble upon an actually useful and practical tidbit of information which I can apply in my own workflow, which does involve LLMs, but most of the time it's just noise.

w10-1•3mo ago
> what we're getting is straight up voodoo nonsense

Maybe not in this case.

For the AI to create a solution, it has to come up with a vector for your intention and goals. It makes some sense for an AI trained on human persuasion materials (basically, everything has a rhetorical aspect) to also track human persuasion features for intentions.

However, results will vary. Just as people trying to deploy rhetorical techniques (and ridiculous power stances) often come off as foolish, I believe trying to hack your intention vector with all-caps and super-superlatives won't always work as intended (pun intended).

Still, if you find yourself not getting what you want, and you check your prompt and find some persuasion feature missing (e.g., authority), I think it's worth trying to add something on point.

Fargren•3mo ago
> It makes some sense for an AI trained on human persuasion

Why?

> However, results will vary.

Like in voodoo?

I'm sorry to be dismissive, but your comment is entirely dismissing the point it's replying to, without any explanation as to why it's wrong. "You are holding it wrong" is not a cogent (or respectful) response to "we need to understand how our tools work to do engineering".

imiric•3mo ago
And here I am in October 2025 still using "AI" tools via a chat UI in Emacs, like a caveman. I've written some code to help me with managing context and such, but the tools are there when I need them, and otherwise stay out of my way.

I have no interest in trying to understand the thought process of people who write and work like this. They're more interested in chasing the latest overhyped trends produced by tech companies and influencers, than actually producing quality software that solves real-world problems. It's some weird product of the tech and social media echo chambers they perpetually live in, which I find difficult to describe.

But apparently I have to learn about "skills" and "superpowers" now... Give me a break.

JaggerFoo•3mo ago
I don't see any code. Where are the examples of use on real code?
meander_water•3mo ago
The problem with stuff like this is that it's hard to evaluate. You don't even know when the agent is using a skill, or if the skill even made a difference. Using tools lets you at least instrument tool calls, and control what gets executed.
redhale•3mo ago
I agree, I think traceability will be extremely important in evolving and improving a system like this. Since scripting is involved in searching for and managing skills, I feel like there is probably a way to achieve some kind of use tracing, but I'm not quite sure. Seems like this, if implemented, could also be fed back into the system for self improvement.
herval•3mo ago
Fascinating write-up. I loved this bit of debugging:

> The first time we played this game, Claude told me that the subagents had gotten a perfect score. After a bit of prodding, I discovered that Claude was quizzing the subagents like they were on a gameshow. This was less than useful. I asked to switch to realistic scenarios that put pressure on the agents, to better simulate what they might actually do.

Also his Claude says shit a lot

Aloisius•3mo ago
What's up with people (or I suppose AI) including copyright licenses in AI generated code?

At least it's an MIT license, but since AI output isn't copyrightable, I'm unsure what the point is since people can legally ignore the license.

hugh-avherald•3mo ago
^ (not legal advice -- far from it)
Aloisius•3mo ago
If there some reason why one wouldn't be able to ignore the copyright license of something not protected by copyright, I'd love to hear it.

The copyright office has been quite clear (rightly so imo) that AI output is not protected by copyright without substantial human creative expression in the final product and purely prompt-created works simply don't qualify.

Indeed, I expect people muddling their codebases with AI output are going to find themselves in an interesting position of having to prove how much code humans actually wrote to enforce copyright claims if their code ever gets leaked.

kragen•3mo ago
That's just the copyright office of one country out of a couple hundred, the courts can overrule them, and legislation can change. However, I agree that currently in the US (or on code written in the US) copyright probably doesn't inhere in AI-written code.
Aloisius•3mo ago
The US constitution limits copyright to protection for authors and inventors. I'm skeptical that a simple law could extend protection to machine generated works without being ruled unconstitutional nor does there appear to be any significant government or public support for such a thing.

And while yes, the US is just one country, but it does have a bit of an outsized software development industry. I also haven't hear of any other countries lining up to give machine-generated works copyright protection.

kragen•3mo ago
The US already extends copyright to the output from compilers, on the flimsy basis that it is a "literary work", and enacted a sui generis 20-year "mask works" right for chip layouts, which are generally output from EDA tools. It's hard to predict what politics will do, except in the very general sense that policies that have no constituency will not be enacted.
benrutter•3mo ago
I'm so curious around what people's median experience is of AI coding tools.

I've tried agents every now and then, recently for something very simple- add an option to request csb format in a data api.

The results were, well, not good. . . I ended up undoing literally all changes because writing from scratch was a lot easier than trying to refactor the total mess it has made from what I'd have thought was a trivial feature.

I haven't done loads of prompt engineering etc, in all honesty it seems a lot of work when I haven't seen promise yet in the tool.

I see articles like this, and I always wonder, am I the outlier or is the writer? My experience of agentic AI is so hugely different to what some people are finding.

x0x0•3mo ago
I'm the same, with the same question if it's me.

I've had success with eg spitting out templated html; sometimes with css; sometimes with writing tests where I'm very specific about what I want (set up these structures, test this condition), etc. It's mediocre (good start, very far from production) with writing screens in react native. It does slightly better on rails, but far from production ready.

After that, it kinda works, but my effort level to turn the output into working code is higher than just writing it myself.

aydyn•3mo ago
Think of this: whats the likelihood that what you are asking for would be found in some public github repo? If its high then you are good to go.
a123b456c•3mo ago
I think you're pointing in the right direction, but I would rephrase as,

what's the likelihood that the solution exists in the github repo in a way that the machine can recognize as relevant to your prompt?

If many versions of the solution exist, due to the problem's common occurrence, and if you can evaluate the LLM's output, then you're good to go.

cosmodust•3mo ago
It's very use case specific, I find them really good in simple repetitive tasks as long as you guide them at low level. Although you do need to keep a close eye as they easily spoil your existing work.
vijucat•3mo ago
They're great at creating test cases out of code and/or log file excerpts. They're good at run-of-the-mill tasks whose answer one can reasonably expect to find on StackOverflow. I'm using GPT-4.1 and Clause Sonnet Thinking 3.7 with vscode + GitHub Copilot
sothatsit•3mo ago
Agent performance depends massively on the work you do.

For example, I have found Claude Code and Codex to be tremendously helpful for my web development work. But my results for writing Zig are much worse. The gap in usefulness of agents between tasks is very big.

The skill ceiling for using agents is also surprisingly high. Planning before coding, learning agent capabilities, environment setup, and context engineering can make a pretty massive difference to results. This can all be a big time sink though, and I'm not sure if it's really worth it if agents don't already work decently well for the work you do.

But with the performance gaps between domains, and the skill curve, I can definitely understand why there is such a divide between people claiming agents are ridiculously overhyped, and people who claim coding is fundamentally changing.

mattmanser•3mo ago
I feel there's a third reason.

When I see a pro-AI person insisting that they are fully automated, I often scour their recent comments to find code or git repos they have shared. You find something every now and again.

My thinking is that I want to use this stuff, but don't find the agentic AI at all effective. I must be doing something wrong! So I should learn from the real world success of others.

A regular pattern is they say they're using vibe coding for complex problems. You check, and they're trivial features.

One egregious example was a basic randomizer to pick a string from a predetermined set, and save that value into an existing table to re-use later.

To me that's a trivial feature, a 15-30 minute task in a codebase I'm familiar with.

For this extremely AI bullish developer it was described as a major feature. The prompts were timestamped and it took them 1/2 day using coding agents.

They were sharing their .claude folder. It had 50 odd md files in it. I sampled a bunch of them and most of them boiled down to:

'You are an expert [dev/QA/architect/PM/tester]. Ultrathink. Be good'.

Worse, I looked at their linkedin, and on paper they looked experienced. Seeing their code, they were not.

There's a subset of the "fully automated" coders who are just bad. They are incapable of judging how bad AI code is. But vocally, and often aggressively, advocate for it.

Some are good, but I just can't replicate their success. And they're clearly also still hand-writing a lot of the code.

sothatsit•3mo ago
Yeah, I definitely see this as well. These are the people with seven MCP servers, 5000-line AGENTS.md files, their own "memory systems" for the agents, and who try to hit their rate-limits on all their agents every 5 hours (regardless of whether or not they are actually getting useful work done). Having tried some of this stuff when I was trying to learn about agents, it almost always made their performance worse...

In web development, where I get the most out of agents, I am still only using them for implementing basic things. I will write anything even moderately complex, as agents often make the wrong assumptions somewhere. And then there's also manual work required to review and tidy up agent output. But there's just so much grunt work in web development from adding to a DB schema, writing a migration, adding the data to your model, exposing it in an API endpoint, and finally showing it on a page. None of that is complicated, so agents are pretty good at it.

theshrike79•3mo ago
Yea, these are the NFT/Crypto bros of the AI world. They don't really understand anything.

The best of them are rediscovering basic software project management and post about it on every social media site and their substack like they discovered something brand new :)

"Turns out if you plan first, then iterate on the plan and split the plan into manageable chunks, development is a lot smoother!!!11 (subscribe to my AI podcast)"

No shit, Sherlock. I wish they read a book once or twice.

sfn42•3mo ago
As someone who has been fairly negative towards AI until recently, the problem is how you use it.

If you just tell it some vague feature to make, it's gonna do whatever it's gonna do and maybe it will be good, maybe it won't. It probably won't. The more specific you are the better it will do.

Instead of trying to 100x or 1000x your effort, try to just 2x or 3x it. Give it small specific tasks and check the work thoroughly, use it as an extension of yourself rather than a separate "agent".

I can tell it to write a function and it'll do pretty well. I can ask it to fix things if it doesn't do it the way I want. This is all easy. Maybe I can even get it to write a whole class at once or maybe I can get it to write a class in a few iterations.

The key here is I'm in control, I'm doing the design, I'm making the decisions. I can ask it how I should approach a problem and often it'll have great suggestions. I can ask it to improve a function I've written and it'll do pretty well. Some times really well.

The point is I'm using it as a tool I'm not using it to do my job for me. I use it to help me think I don't use it to think for me. I don't let it run away from me and edit a whole bunch of files etc, I keep it on a tight leash.

I'm sold now. I am, indisputably, a better software developer with LLMs in my toolbelt. They help me write better code, faster, while learning things faster and easier, it's really good. Reliability isn't a problem when I keep a close eye on it. It's only a problem if you try to get it to do a whole big task on it's own.

Anamon•3mo ago
That sounds fine, but I wouldn't expect you'd still be at 2× to 3× then. Maybe closer to 1.2× to 1.3× (although studies seem to show it's more often actually 0.9×). Coding is already barely 10% of the work, the agents won't help much with the other parts, and now you have a whole load of additional things to check and potentially fix, which you didn't have to before because they're obvious to a human mind.
danielbarla•3mo ago
I think a lot if comes down to the domain, language and frameworks, your expectations, as well as prompt engineering. Having said that, I have had a number of excellent experiences in the past few weeks:

- Case 1 was troubleshooting what turned out to be a complex and messy dependency injection issue. I got pulled in to unblock a team member, who was struggling with the issue. My efforts were a dead-end, but Claude (Code) managed to spot a very odd configuration issue. The codebase is a large, legacy one.

- Case 2 was the same codebase, I again got pulled in to unblock a team mate, investigating why some integration tests were running individually, but not when run as a group. Clearly there was a pretty obvious smoking gun, and I managed to isolate the issue after about 15-30 minutes of debugging. I had set Claude on the goose chase as well, and as I closed the call with my teammate, I noticed it had found the same exact two lines that were causing the issue.

Clearly, it occasionally does insane stuff, or lies its little pants off. The number of times where it "got me" are fairly low, however, and its usefulness to me is extreme. In the cases above, it out-did a teammate who has at least 10 years of experience, and equalled me in the one case and outdid me in the other, with over 25 years now. I have a similar wonderment to your situation, but the opposite: "how are people NOT finding value in this?".

ekidd•3mo ago
> I'm so curious around what people's median experience is of AI coding tools.

My experience is coding agents work best for either absolute beginners, or for lead engineers who have experience building and training teams. Getting good results out of coding agents is a lot like getting good results out of interns: You need to explain clearly what you want, ask them to explain what they plan to do, give feedback on the plan, and then very carefully review the results. You need to write up your preferred coding style, you need a document that explains "how to work on this project", you need to establish rigorous automated quality checks, etc. Using a coding agent heavily is a lot like being promoted to "technical lead", with all the tradeoffs that entails.

Here's a recent discussion of a good blog post on the subject: https://news.ycombinator.com/item?id=45503867

I have gotten some very nice results out of Sonnet 4.5 this past week. But it required using my "technical management" skills very heavily. And it required lots extremely careful code review. Clear documentation, robust QA, and code review are the main bottlenecks.

I mean, the time I spent writing AGENTS.md wasn't wasted. I'm writing down a lot of stuff I used to teach in pairing sessions.

lazarus01•3mo ago
AI coding works amazingly well

But only on micro tasks, coming with explicit instructions, inside a very well documented architecture.

Give AI freedom of expression and they will never find first principals in their training data. You will receive code that is not performant and when analyzing the output, AI will try to convince you that it is. If the task goes beyond your domain, you may believe the wrong principals are ok.

kreyenborgi•3mo ago
Anyone else get the feeling like CLAUDE.md fiddling is the new dotemacs fiddling?
zkmon•3mo ago
<Homer Simpson mode>Oh yeah? If prompting is such damn cool hard thing, why can't I ask my AI slave to do all this prompting mumbo jumbo for me?</Homer Simpson mode>
tobbe2064•3mo ago
Is it possible to set up this kind of workflow with the plug in that comes bundled with vs code, given that you have an enterprise github copilot account that includes Claude?
redhale•3mo ago
Subagents are a critical feature that GH Copilot still lacks. They allow your main agent to use another agent as a tool, meaning the main agent's context doesn't get nearly as polluted. Good read on the benefits of this pattern: https://jxnl.co/writing/2025/08/29/context-engineering-slash...
d4rkp4ttern•3mo ago
A big issue working with code agents is what I call context-recall: restoring context when working on a new feature or fix, that builds on recent work.

Meaning, the previous work may have involved multiple CLI sessions, summaries dumped to various markdown files like documentation files, plan files, issue files, PR-descriptions etc. Then when starting new work with a code agent you have to hunt down all of this scattered context from various md files and session logs to fill in background for the code-agent about what was recently done.

I see many workflows that help with working on a fresh feature or fix, but nothing that addresses context-recall. But maybe the OP workflow or others do that, I haven’t dug too deep into them.

d4rkp4ttern•3mo ago
(Just released the OP blog actually does address exactly this)
dwb•3mo ago
Honestly, if the LLM/agent can't do what I want with a simple, shortish prompt that I understand, augmented by some well-chosen tool calls, I'm not interested. These incantations may or may not work, but I just don't want them. Reams of vague twiddling of an unknowable black box. I want the amount of mystery kept at an absolute minimum when I'm programming.
iamjfu•3mo ago
I am interested by this link: https://blog.fsck.com/blog/2025/superpowers/superpowers-demo...

``` Claude Code v2.0.13 Sonnet 4.5 (with 1M token context) Claude Max /Users/jesse/tmp/new-tool/.worktrees/todo-cli ```

How does this person have access to Sonnet 4.5 with 1m token context? I don't see this referenced anywhere when I search or when I ask Claude about it.

d4rkp4ttern•3mo ago
It’s a limited release beta feature not available to all. You can try to activate it by doing: /model sonnet[1m] And it accepts it but the at the next API call it may fail and say “this beta model is not available with your subscription”.

I haven’t gotten access yet.

One of the nice things about Codex (GPT-5) is the supposed 400k token context (although performance starts to deteriorate when you get to 80% context usage).

piperswe•3mo ago
OpenRouter shows Sonnet 4.5 as having a 1M context limit: https://openrouter.ai/anthropic/claude-sonnet-4.5
AlexCoventry•3mo ago
I think this is cool, but some performance benchmarks would really help to sell it.
throw-10-13•3mo ago
“Here is a collection of arcane incantations and humiliating prostrations I use to get my AI homunculus to serve me.”

Having to beg and emotionally manipulate an agent into doing what you want goes so far beyond black-box that I find it difficult to believe these people actually get useful work done using these tools.

I generally consider myself pro-ai in the workplace, but this nonsense is starting to change my mind.

StapleHorse•3mo ago
A little bit off topic. I love how AI is advancing so fast that the usual title: "How i'm using XX in 20NN" is not specific enough, now we need the month.
novoreorx•3mo ago
To me, this kind of stuff is like bloated boilerplates such as "full-stack e-commerce SaaS NextJS boilerplate." I never use them because I want more control and fewer unpredictabilities. They seem to save you some time, but you will pay a lot more for it later when you encounter deep bugs or need to refactor. For this reason, I won't use prompt templates for agentic coding tools either. There have been enough suggestions to write your own AGENTS.md and not overcomplicate the prompts.