frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

GNU Health

https://www.gnuhealth.org/about-us.html
147•smartmic•2h ago•36 comments

The <output> Tag

https://denodell.com/blog/html-best-kept-secret-output-tag
559•todsacerdoti•9h ago•132 comments

Microsoft Amplifier

https://github.com/microsoft/amplifier
99•JDEW•2h ago•76 comments

Show HN: Gnokestation Is an Ultra Lightweight Web Desktop Environment

https://gnokestation.netlify.app
10•edmundsparrow•46m ago•4 comments

Vibing a non-trivial Ghostty feature

https://mitchellh.com/writing/non-trivial-vibing
93•skevy•3h ago•34 comments

Testing two 18 TB white label SATA hard drives from datablocks.dev

https://ounapuu.ee/posts/2025/10/06/datablocks-white-label-drives/
29•thomasjb•5d ago•13 comments

AMD and Sony's PS6 chipset aims to rethink the current graphics pipeline

https://arstechnica.com/gaming/2025/10/amd-and-sony-tease-new-chip-architecture-ahead-of-playstat...
242•zdw•13h ago•264 comments

The World Trade Center under construction through photos, 1966-1979

https://rarehistoricalphotos.com/twin-towers-construction-photographs/
121•kinderjaje•4d ago•49 comments

Superpowers: How I'm using coding agents in October 2025

https://blog.fsck.com/2025/10/09/superpowers/
156•Ch00k•10h ago•96 comments

Windows Subsystem for FreeBSD

https://github.com/BalajeS/WSL-For-FreeBSD
152•rguiscard•10h ago•41 comments

Crypto-Current (2021)

https://zerophilosophy.substack.com/p/crypto-current
5•keepamovin•5d ago•3 comments

How to Check for Overlapping Intervals

https://zayenz.se/blog/post/how-to-check-for-overlapping-intervals/
29•birdculture•2h ago•7 comments

A Quiet Change to RSA

https://www.johndcook.com/blog/2025/10/06/a-quiet-change-to-rsa/
58•ibobev•4d ago•18 comments

I built physical album cards with NFC tags to teach my son music discovery

https://fulghum.io/album-cards
502•jordanf•21h ago•175 comments

Wilson's Algorithm

https://cruzgodar.com/applets/wilsons-algorithm/
11•FromTheArchives•4h ago•1 comments

Building a JavaScript Runtime from Scratch using C

https://devlogs.xyz/blog/building-a-javaScript-runtime
25•redbell•3d ago•15 comments

A Library for Fish Sounds

https://nautil.us/a-library-for-fish-sounds-1239697/
23•pistolpete5•4d ago•4 comments

(Re)Introducing the Pebble Appstore

https://ericmigi.com/blog/re-introducing-the-pebble-appstore/
239•duck•20h ago•43 comments

How hard do you have to hit a chicken to cook it? (2020)

https://james-simon.github.io/blog/chicken-cooking/
150•jxmorris12•16h ago•89 comments

Daniel Kahneman opted for assisted suicide in Switzerland

https://www.bluewin.ch/en/entertainment/nobel-prize-winner-opts-for-suicide-in-switzerland-261946...
405•kvam•10h ago•357 comments

Tangled, a Git collaboration platform built on atproto

https://blog.tangled.org/intro
276•mjbellantoni•21h ago•71 comments

Programming in the Sun: A Year with the Daylight Computer

https://wickstrom.tech/2025-10-10-programming-in-the-sun-a-year-with-the-daylight-computer.html
141•ghuntley•18h ago•47 comments

Let's Take Esoteric Programming Languages Seriously

https://feelingof.com/episodes/078/
63•strombolini•3d ago•13 comments

Does our “need for speed” make our wi-fi suck?

https://orb.net/blog/does-speed-make-wifi-suck
236•jamies•23h ago•278 comments

Show HN: I invented a new generative model and got accepted to ICLR

https://discrete-distribution-networks.github.io/
610•diyer22•1d ago•82 comments

AV2 video codec delivers 30% lower bitrate than AV1, final spec due in late 2025

https://videocardz.com/newz/av2-video-codec-delivers-30-lower-bitrate-than-av1-final-spec-due-in-...
233•ksec•9h ago•140 comments

Synthetic aperture radar autofocus and calibration

https://hforsten.com/synthetic-aperture-radar-autofocus-and-calibration.html
160•nbernard•3d ago•9 comments

Learn Turbo Pascal – a video series originally released on VHS

https://www.youtube.com/watch?v=UOtonwG3DXM
91•AlexeyBrin•6h ago•32 comments

Firefox is the best mobile browser

https://kelvinjps.com/blog/firefox-best-mobile-browser/
170•kelvinjps10•4h ago•94 comments

Show HN: A Digital Twin of my coffee roaster that runs in the browser

https://autoroaster.com/
120•jvkoch•5d ago•35 comments
Open in hackernews

Superpowers: How I'm using coding agents in October 2025

https://blog.fsck.com/2025/10/09/superpowers/
156•Ch00k•10h ago

Comments

gjm11•5h ago
Has anyone ever seen an instance in which the automated "How" removal actually improves an article title on HN rather than just making them wrong?

(There probably are some. Most likely I notice the bad ones more than the good ones. But it does seem like I notice a lot of bad ones, and never any good ones.)

[EDITED to add:] For context, the actual article title begins "Superpowers: How I'm using ..." and it has been auto-rewritten to "Superpowers: I'm using ...", which completely changes what "Superpowers" is understood as applying to. (The actual intention: superpowers for LLM coding agents. The meaning after the change: LLM coding agents as superpowers for humans.)

add-sub-mul-div•4h ago
I agree, I'm sure I've seen instances of where it's worked but the problem is that when it messes it up it's much more annoying than any benefit it brings when it does work. Some of us don't want to be reminded that tech is full of hubris, overconfidence, poor judgment, and failure about what can/should be abstracted and automated.
dvfjsdhgfv•3h ago
Yeah, to the point I can recall several examples where the title stuck out as dumb on HN and only when visiting the original page it started to make sense, but not a single case where I could say the automated removal really did a good job.
bryanrasmussen•3h ago
I've had it happen with me a few times where it was reasonable, sometimes where it was debatable, and if it was just wrong I edit it to add the How back in.
jvanderbot•4h ago
This is so interesting but it reads like satire. I'm sure folks who love persuading and teaching and marshalling groups are going to do very well in SWEng.

According to this, we'll all be reading the feelings journals of our LLM children and scolding them for cheating on our carefully crafted exams instead of, you know, making things. We'll read psychology books, apparently.

I like reading and tinkering directly. If this is real, the field is going to leave that behind.

sunir•4h ago
We certainly will; they can’t replace humans in most language tasks without having a human like emotional model. I have a whole therapy set of agents to debug neurotic long lived agents with memory.
jvanderbot•4h ago
Ok, call me crazy, but I don't actually think there's any technical reason that a theoretical code generation robot needs emotions that are as fickle and difficult to manage as humans.

It's just that we designed this iteration of technology foundationally on people's fickle and emotional reddit posts among other things.

It's a designed-in limitation, and kind of a happy accident it's capable of writing code at all. And clearly carries forward a lot of baggage...

sunir•3h ago
Maybe. I use QWAN frequently when working with the coding agents. That requires an llm equivalent of interoception to recognize when the model understanding is scrambled or “aligned with itself” which is what qwan is.
ambicapter•3h ago
If you can find enough training data that does human-like things without have human-like qualities, we are all ears.
jvanderbot•1h ago
It can be simultaneously the best we have, and well short of the best we want. It can be a remarkable achievement and fall short of the perceived goals.

That's fine.

Perhaps we can RL away some of this or perhaps there's something else we need. Idk, but this is the problem when engineers are the customer, designer, and target audience.

dingnuts•3h ago
what on God's green Earth could the CEO of a no name b2b saas have a use for long running agents?

either your business isn't successful, so you're coding when you shouldn't be, or cosplaying coding with Claude, or you're lying, or you're telling us about your expensive and unproductive hobby.

How much do you spend on AI? What's your annual profit?

edit: oh cosplaying as a CEO. I see. Nice WPEngine landing page Mr AppBind.com CEO. Better have Claude fix your website! I guess that agent needs therapy...

lerp-io•4h ago
take #73895 on how to fix ur prompt to make ur slop better.
anuramat•3h ago
is better slop a bad thing somehow?
dvfjsdhgfv•3h ago
Well, slop is slop, we can discuss the details but the basic thing is invariant.
anuramat•52m ago
why reiterate the invariant?
apwell23•2h ago
yeah none them can actually prove or even explain it in words why thier own golden prompting technique is superior. its all vibes. so annoying, i want to slap these ppl lol.
lerp-io•45m ago
for real lmao
amelius•4h ago
It's not a superpower if everybody has that same power.
cantor_S_drug•3h ago
Everyone is better off with mobile phones. We can solve more diverse problems faster. Similarly we can combine our diverse superpowers (as they show in kids cartoons)
Avicebron•4h ago
I often feel these types of blogposts would be more helpful if they demonstrated someone using the tools to build something non-trivial.

Is Claude really "learning new skills" when you feed it a book, or does it present it like that because you're prompting encourages that sort of response-behavior. I feel like it has to demo Claude with the new skills and Claude without.

Maybe I'm a curmudgeon but most of these types of blogs feel like marketing pieces with the important bit is that so much is left unsaid and not shown, that it comes off like a kid trying to hype up their own work without the benefit of nuance or depth.

khaledh•4h ago
Agreed. The methodology needed here is something like an A/B test, with quantifiable metrics that demonstrate the effectiveness of the tool. And to do it not just once, but many times under different scenarios so that it demonstrates statistical significance.

The most challenging part when working with coding agents is that they seem to do well initially on a small code base with low complexity. Once the codebase gets bigger with lots of non-trivial connections and patterns, they almost always experience tunnel vision when asked to do anything non-trivial, leading to increased tech debt.

mwigdahl•3h ago
The problem is that you're talking about a multistep process where each step beyond the first depends on the particular path the agent starts down, along with human input that's going to vary at each step.

I made a crude first stab at an approach that at least uses similar steps and structure to compare the effectiveness of AI agents. My approach was used on a small toy problem, but one that was complex enough the agents couldn't one-shot and required error correction.

It was enough to show significant differences, but scaling this to larger projects and multiple runs would be pretty difficult.

https://mattwigdahl.substack.com/p/claude-code-vs-codex-cli-...

potatolicious•1h ago
What you're getting at is the heart of the problem with the LLM hype train though, isn't it?

"We should have rigorous evaluations of whether or not [thing] works." seems like an incredibly obvious thought.

But in the realm of LLM-enabled use cases they're also expensive. You'd need to recruit dozens, perhaps even hundreds of developers to do this, with extensive observation and rating of the results.

So rather than actually try to measure the efficacy, we just get blog posts with cherry-picked example of "LLM does something cool". Everything is just anecdata.

This is also the biggest barrier to actual LLM adoption for many, many applications. The gap between "it does something REALLY IMPRESSIVE 40% of the time and shits the bed otherwise" and "production system" is a yawning chasm.

marcosdumay•1h ago
It's the heart of the problem with all software engineer research. That's why we have so little reliable knowledge.

It applies to using LLMs too. I guess the one largest difference here is that LLM has few enough companies with abundant enough money pushing it to make it trivial for them to run a test like this. So the fact that they aren't doing that also says a lot.

oblio•1h ago
> What you're getting at is the heart of the problem with the LLM hype train though, isn't it?

> "We should have rigorous evaluations of whether or not [thing] works." seems like an incredibly obvious thought.

Heh, I'd rephrase the first part to:

> What you're getting at is the heart of the problem with software development though, isn't it?

claytongulick•39m ago
> The methodology needed here is something like an A/B test, with quantifiable metrics that demonstrate the effectiveness of the tool. And to do it not just once, but many times under different scenarios so that it demonstrates statistical significance.

If that's what we need to do, don't we already have the answer to the question?

coolKid721•3h ago
Yeah I was reading this seeing if there was something he'd actually show that would be useful, what pain point he is solving, but it's just slop.
simonw•3h ago
Here's one from today: https://mitchellh.com/writing/non-trivial-vibing
j_bum•2h ago
This was a fun read.

I’ve similarly been using spec.md and running to-do.md files that capture detailed descriptions of the problems and their scoped history. I mark each of my to-do’s with informational tags: [BUG], [FEAT], etc.

I point the LLM to the exact to-do (or section of to-do’s) with the spec.md in memory and let it work.

This has been working very well for me.

lcnPylGDnU4H9OF•2h ago
Do you mind linking to example spec/to-do files?
SteveJS•43m ago
Here is a (3 month old) repo where i did something like that and all the tasks are checked into the linear git history — https://github.com/KnowSeams/KnowSeams
nightski•1h ago
Even though the author refers to it as "non-trivial", and I can see why that conclusion is made, I would argue it is in fact trivial. There's very little domain specific knowledge needed, this is purely a technical exercise integrating with existing libraries for which there is ample documentation online. In addition, it is a relatively isolated feature in the app.

On top of that, it doesn't sound enjoyable. Anti slop sessions? Seriously?

Lastly, the largest problem I have with LLMs is that they are seemingly incapable of stopping to ask clarifying questions. This is because they do not have a true model of what is going on. Instead they truly are next token generators. A software engineer would never just slop out an entire feature based on the first discussion with a stakeholder and then expect the stakeholder to continuously refine their statement until the right thing is slopped out. That's just not how it works and it makes very little sense.

kannanvijayan•1h ago
I've wondered about exposing this "asking clarifying questions" as a tool the AI could use. I'm not building AI tooling so I haven't done this - but what if you added an MCP endpoint whose description was "treat this endpoint as an oracle that will answer questions and clarify intent where necessary" (paraphrased), and have that tool just wire back to a user prompt.

If asking clarifying questions is plausible output text for LLMs, this may work effectively.

simonw•1h ago
I think the asking clarifying questions thing is solved already. Tell a coding agent to "ask clarifying questions" and watch what it does!
nightski•51m ago
Obviously if you instruct the autocomplete engine to fill in questions it will. That's not the point. The LLM has no model of the problem it is trying to solve, nor does it attempt to understand the problem better. It is merely regurgitating. This can be extremely useful. But it is very limiting when it comes to using as an agent to write code.
simonw•1h ago
The hardest problem in computer science in 2025 is presenting an example of AI-assisted programming that somebody won't call "trivial".
nightski•48m ago
If all I did was call it trivial that would be a fair critique. But it was followed up with a lot more justification than that.
antonvs•1h ago
> A software engineer would never just slop out an entire feature based on the first discussion with a stakeholder and then expect the stakeholder to continuously refine their statement until the right thing is slopped out. That's just not how it works and it makes very little sense.

Didn’t you just describe Agile?

qsort•1h ago
> Important: there is a lot of human coding, too.

I'm not highlighting this to gloat or to prove a point. If anything in the past I have underestimated how big LLMs were going to be. Anyone so inclined can take the chance to point and laugh at how stupid and wrong that was. Done? Great.

I don't think I've been intentionally avoiding coding assistants and as a matter of fact I have been using Claude Code since the literal day it first previewed, and yet it doesn't feel, not even one bit, that you can take your hands off the wheel. Many are acting as if writing any code manually means "you're holding it wrong", which I feel it's just not true.

simonw•1h ago
Yeah, my current opinion on this is that AI tools make development harder work. You can get big productivity boosts out of them but you have to be working at the top of your game - I often find I'm mentally exhausted after just a couple of hours.
jstummbillig•1h ago
Considering the last 2 years, has it become harder or easier?
sawmurai•13m ago
I have a similar experience. It feels like riding your bike in a higher gear - you can go faster but it will take more effort and you need the potential (stronger legs) to make use of it
dotinvoke•7m ago
My experience with AI tools is the opposite. The biggest energy thieves for me are configuration issues, library quirks, or trivial mistakes that are hard to spot. With AI I can often just bulldoze past those things and spend more time on tangible results.

When using it for code or architecture or design, I’m always watching for signs that it is going off the rails. Then I usually write code myself for a while, to keep the structure and key details of whatever I’m doing correct.

oblio•1h ago
LLMs are autonomous driving level 2.
spankibalt•2h ago
> "Maybe I'm a curmudgeon but most of these types of blogs feel like marketing pieces with the important bit is that so much is left unsaid and not shown, that it comes off like a kid trying to hype up their own work without the benefit of nuance or depth."

C'mon, such self-congratulatory "Look at My Potency: How I'm using Nicknack.exe" fluffies always were and always will be a staple of the IT industry.

lcnPylGDnU4H9OF•2h ago
Still, the best such pieces are detailed and explanatory.
causal•1h ago
Using LLMs for coding complex projects at scale over a long time is really challenging! This is partly because defining requirements alone is much more challenging than most people want to believe. LLMs accelerate any move in the wrong direction.
dexwiz•1h ago
My analogy is LLMs are a gas pedal. Makes you go fast, but you still have to know when to turn.
sreekanth850•1h ago
True
sreekanth850•1h ago
One should know theend to end design and architecture. Should stop llm when adding complex fancy things.
SteveJS•52m ago
Having the llm write the spec/workunit from a conversation works well. Exploring a problem space with a (good) coding agent is fantastic.

However for complex projects IMO one must read what was written by the llm … every actual word.

When it ‘got away’ from me, in each case I left something in the llm written markdown that I should have removed.

99% “I can ask for that later” and 1% “that’s a good idea i hadn’t considered” might be the right ratio when reading an llm generated plan/spec/workunit.

Breaking work into single context passes … 50-60k tokens in sonnet 4.5 has had typically fantastic results for me.

My side project is using lean 4 and a carelessly left in ‘validate’ rather than ‘verify’ lead down a hilariously complicated path equivalent to matching an output against a known string.

I recovered, but it wasn’t obvious to me that was happening. I however would not be able to write lean proofs myself, so diagnosing the problem and fixing it is a small price to be able to mechanically verify part of my software is correct.

jackblemming•4h ago
Seems cute, but ultimately not very valuable without benchmarks or some kind of evaluation. For all I know, this could make Claude worse.
jelling•1h ago
Same. We've all fooled ourselves into believing that an LLM / stochastic process was finally solved based on a good result. But the sample size is always to low to be meaningful.
anuramat•54m ago
even if it works as described, I'm assuming it's extremely model dependent (eg book prerequisites), so you'd have to re-run this for every model you use, this is basically poor man's finetuning;

maybe explicit support from providers would make it feasible?

tobbe2064•3h ago
What's the cost of running with agents like this?
dbbk•3h ago
Claude Max is fixed cost
jmull•3h ago
> <EXTREMELY_IMPORTANT>…*RIGHT NOW, go read…

I don’t like the looks of that. If I used this, how soon before those instructions would be in conflict with my actual priorities?

Not everything can be the first law.

apwell23•2h ago
don't llm tell you not to give them instructions like that these days
therealdrag0•2h ago
Seems like maintaining a bashrc file. Sometimes you have to go tweak it.
simonw•3h ago
I can't recommend this post strongly enough. The way Jesse is using these tools is wildly more ambitious than most other people.

Spend some time digging around in his https://github.com/obra/Superpowers repo.

I wrote some notes on this last night: https://simonwillison.net/2025/Oct/10/superpowers/

csar•3h ago
I’m curious how you think this compares to the Research -> Plan -> Implement method and prompts from the “Advanced Context Engineering from Agents” video when it comes to actual coding performance on large codebases. I think picking up skills is useful for broadening agents abilities, but I’m not sure I’d that’s the right thing for actual development.

The packaged collection is very cool and so is the idea of automatically adding new abilities, but I’m not fully convinced that this concept of skills is that much better than having custom commands+sub-agents. I’ll have to play around with it these next few days and compare.

apwell23•2h ago
simon is the biggest ai hype man :) . everything is always strong, wild , superb , amazing. his blog posts like the one he linked here never explain why it is wild , strong ,super or any of other superlatives he uses.

also whats with overuse of these words in ai space. every team is 'cracked' , everything is 'wild'. real car salesman vibes.

simonw•2h ago
Here's a counter-example for you from the another day: https://simonwillison.net/2025/Oct/8/claude-datasette-plugin...

> This isn’t necessarily surprising, but it’s worth noting anyway. Claude Sonnet 4.5 is capable of building a full Datasette plugin now.

I do worry a bit about how often I use positive adjectives. If something isn't notable I won't write about it though. In this particle case Jesse's prompting / skills stuff really does deserve the superlatives IMO.

apwell23•2h ago
well explain why OPost is "wild" and what makes you recommend it "strongly" .

what have u built with to come to those conclusions ? is this too much to ask.

simonw•1h ago
I recommend it strongly because the "skills" mechanism it describes is a new and very promising technique, and this is the best article I've seen that explains that.

It's "wild" because, among many other experiments, Jesse has experimented with giving Claude a "feelings journal" and prompting it using Graphviz DOT diagrams.

For my previous writing and work on this you can consult my blog - here's the AI-assisted programming tag: https://simonwillison.net/tags/ai-assisted-programming/

apwell23•22m ago
ok so you have not used it personally or have built anything with it yet you feel wild about it.

great.

troupo•1h ago
This looks like usage rules in Elixir, but for agent behaviors, and currently specifically for Claude: https://hexdocs.pm/usage_rules/readme.html
spprashant•3h ago
I am not ashamed to admit this whole agentic coding movement has moved beyond me.

Not only do I have know everything about the code, data and domain, but now I need to understand this whole AI system which is a meta skill of its own.

I fear I may never be able catch up till someone comes along and simplifies it for pleb consumption.

gdulli•3h ago
It's also possible to put in enough hours of real coding to get to the point where coding really isn't that hard anymore, at least not hard enough to justify switching from those stable/solid fundamental skills to a constantly revolving ecosystem of ephemeral tools, models, model versions, best practices, lessons from trial and error, etc. Then you could bypass all of this distraction.

Admittedly that stance is easiest to take if you were old enough, experienced enough already by the time this era hit.

paweladamczuk•9m ago
"There exist developers whose performance cannot be boosted by an LLM" is a really strong statement.
evanmoran•2h ago
To give you a process that might help:

I’ve found you have to use Claude Code to do something small. And as you do it iterate on the CLAUDE.md input prompt to refine what it does by default. As it doesn't do it your way, change it to see if you can fix how it works. The agent is then equivalent to calling chatgpt / sonnet 1000 times a hour. So these refinements (skills in the post are a meta approach) are all about how to tune the workflow to be more accurate for your project and fit your mental model. So as you tune the md file you’ll start to feel what is possible and understand agent capabilities much better.

So short story you have to try it, but long story its the iteration of the meta prompt approach that teaches you whats possible.

philbo•1h ago
I think this and other recent posts here hugely overcomplicate matters. I notice none of them provides an A/B test for each item of complexity they introduce, there's just a handwavy "this has proved to work over time".

I've found that a single CLAUDE.md does really well at guiding it how I want it to behave. For me that's making it take small steps and stop to ask me questions frequently, so it's more like we're pairing than I'm sending it off solo to work on a task. I'm sure that's not to everyone's taste but it works for me (and I say this as someone who was an agent-sceptic until quite recently).

Fwiw my ~/.claude/CLAUDE.md is 2.2K / 49 lines.

lcnPylGDnU4H9OF•54m ago
I haven't really done much of it but my plan is just to practice. This seems like a powerful thing to start with.
cruffle_duffle•35m ago
I’ve personally decided that cursor agent mode is good enough. A single foreground instance of cursor doing its thing is plenty enough to babysit. Based upon that experience I am highly highly skeptical people are actually creating things of value with these multi-agent-running-in-the-background setups. Way to much babysitting and honestly writing docs and specs for them is more work than just writing parts of the code myself and letting the LLM do the tedious bits like finishing what I started.

No matter what you are told, there is no silver bullet. Precisely defining the problem is always the hard part. And the best way to precisely define a problem and its solution is code.

I’ll let other people fight swarms of bots building… well who knows what. Maybe someday it will deliver useful stuff, but I’m highly skeptical.

hoechst•32m ago
Much of it is just "paste this magic string before your prompt to make the LLM 10x better" voodoo, similar to the SEO voodoo common in the 2000s.

just remember that it works the same for everyone: you input text, magic happens, text comes out.

if you can properly explain a software engineering problem in plain language, you're an expert in using LLMs. everything on top of that people experimenting or trying to build the next big thing.

daemontus•3h ago
Maybe this is a naive question, but how are "skills" different from just adding a bunch od examples of good/bad behavior into the prompt? As far as I can tell, each skill file is a bunch of good/bad examples of something. Is the difference that the model chooses when to load a certain skill into context?
nrjames•3h ago
I think it just gives you the ability to easily do that with slash command, like using "/brainstorm database schema" or something instead of needing to define what "brainstorm" means each time you want to do it.
hackernewds•3h ago
what you are suggesting is 1-shot, 2-shot, 5-shot etc prompting which is so effective that it's how benchmarks were presented for a while
simonw•2h ago
I think that's one of the key things: skills don't take up any of the model context until the model actively seeks out and uses them.

Jesse on Bluesky: https://bsky.app/profile/s.ly/post/3m2srmkergc2p

> The core of it is VERY token light. It pulls in one doc of fewer than 2k tokens. As it needs bits of the process, it runs a shell script to search for them. The long end to end chat for the planning and implementation process for that todo list app was 100k tokens.

> It uses subagents to manage token-heavy stuff, including all the actual implementation.

tcdent•3h ago
This style of prompting, where you set up a dire scenario in order to try to evoke some "emotional" response from the agent, is already dated. At some point, putting words like IMPORTANT in all uppercase had some measurable impact, but at the present time, models just follow instructions.

Save yourself the experience of having to write and maintain prompts like this.

bcoates•56m ago
Also the persuasion paper he links isn't at all about what he's talking about.

That paper is about using persuasion prompts to overcome trained in "safety" refusals, not to improve prompt conformance.

jstummbillig•3h ago
How are skills different from tools? Looks like another layer of abstraction. What for?
cynicalsecurity•3h ago
Superpower: AI slop.
echelon•2h ago
I'm sure the horse whip manufacturers had similar things to say about steam powered horses. We just don't think about them much anymore.

The whole world is changing around us and nothing is secure. I would not gamble that the market for our engineering careers is safe with so much disruption happening.

Tools like Lovable are going to put lots of pressure on technical web designers.

Business processes may conform to the new shape and channels for information delivery, causing more consolidation and less duplication.

Or perhaps the barrier to entry for new engineers, in a worldwide marketplace, lowers dramatically. We have accessible new tools to teach, new tools to translate, new tools to coordinate...

And that's just the bear case where nothing improves from what we have today.

yoyohello13•43m ago
Nice try Jensen.
4b11b4•2h ago
I'm not sure exactly what I just read...

Is this just someone who has tingly feelings about Claude reiterating stuff back to them? cuz that's what an LLM does/can do

intended•2h ago
This isnt science, or engineering.

This is voodoo.

It likely works - but knowing that YAGNI is a thing, means at some level you are invoking a cultural touchstone for a very specific group of humans.

Edit -

I dug into the superpowers and skills for a bit. Definitely learned from it.

There’s stuff that doesn’t make sense to me on a conceptual basis. For example in the skill to preserve productive tensions. There’s a part that goes :

> The trade-off is real and won't disappear with clever engineering

There’s no dimension for “valid” or prediction for tradeoff.

I can guess that if the preceding context already outlines tradeoffs clearly, or somehow encodes that there is no clever solution that threads the needle - then this section can work.

Just imagining what dimensions must be encoding some of this suggests that it’s … it won’t work for situations where the example wasn’t already encoded in the training. (Not sure how to phrase it)

clusterhacks•1h ago
> This isnt science, or engineering. > This is voodoo.

I was struggling to find the exact reason this type of article bugs me so much, and I think "voodoo" is precisely the correct phrase to sum up my feelings.

I don't mean that as a judgement on the utility of LLMs or that reading about what different users have tried out to increase that utility isn't valuable. But if someone asked me how to most effectively get started with coding agents, my instinct is to answer (a) carefully and (b) probably every approach works somewhat.

theptip•1h ago
> some of the ones I've played with come from telling Claude "Here's my copy of programming book. Please read the book and pull out reusable skills that weren't obvious to you before you started reading

This is actually a really cool idea. I think a lot of the good scaffolding right now is things like “use TDD” bit if you link citations to the book, then it can perhaps extract more relevant wisdom and context (just like I would by reading the book), weather than using the generic averaged interpretation of TDD derived from the internet.

I do like the idea of giving your Claude a reading list and some spare tokens on the weekend where you’re not working, and having it explore new ideas and techniques to bring back to your common CLAUDE.md.

zahlman•1h ago
> It also bakes in the brainstorm -> plan -> implement workflow I've already written about. The biggest change is that you no longer need to run a command or paste in a prompt. If Claude thinks you're trying to start a project or task, it should default into talking through a plan with you before it starts down the path of implementation.

... So, we're refactoring the process of prompting?

> As Claude and I build new skills, one of the things I ask it to do is to "test" the skills on a set of subagents to ensure that the skills were comprehensible, complete, and that the subagents would comply with them. (Claude now thinks of this as TDD for skills and uses its RED/GREEN TDD skill as part of the skill creation skill.)

> The first time we played this game, Claude told me that the subagents had gotten a perfect score. After a bit of prodding, I discovered that Claude was quizzing the subagents like they were on a gameshow. This was less than useful. I asked to switch to realistic scenarios that put pressure on the agents, to better simulate what they might actually do.

... and debugging it?

... How many other basic techniques of SWEng will be rediscovered for the English programming language?

hoechst•1h ago
documents like https://github.com/obra/superpowers/blob/main/skills/testing... are very confusing to read as a human. "skills" in this project generally don't seem to follow set format and just look like what you would get when prompting an LLM to "write a markdown doc that step by step describes how to do X" (which is what actually happened according to the blog post).

idk, but if you already assume that the LLM knows what TDD is (it probably ingested ~100 whole books about it), why are we feeding a short (and imo confusing) version of that back to it before the actual prompt?

i feel like a lot of projects like this that are supposed to give LLMs "superpowers" or whatever by prompt engineering are operating on the wrong assumption that LLMs are self-learning and can be made 10x smarter just by adding a bit of magic text that the LLM itself produced before the actual prompt.

ofc context matters and if i have a repetitive tasks, i write down my constraints and requirements and paste that in before every prompt that fits this task. but that's just part of the specific context of what i'm trying to do. it's not giving the LLM superpowers, it's just providing context.

i've read a few posts like this now, but what i am always missing is actual examples of how it produces objectively better results compared to just prompting without the whole "you have skill X" thing.

d_sem•1h ago
This article left me wishing it was "How I'm using coding agents to do <x> task better"

I've been exploring AI for two years now. It's certainly upgraded itself from the toy classification to a basic utility. However, I increasingly run into its limitations and find reverting to pre-LLM ways of working more robust, faster, and more mentally sustainable.

Does someone have concrete examples of integrating LLM in a workflow that pushes state-of-the-art development practices & value creation further?

3eb7988a1663•1h ago
I am only on the first page and saw this blurb and was immediately annoyed.

  @/Users/jesse/.claude/plugins/cache/Superpowers/...
The XDG spec has been out for decades now. Why are new applications still polluting my HOME? Also seems weird that real data would be put under a cache/ location, but whatever.
lcnPylGDnU4H9OF•51m ago
The "How to create skills" link is broken. This is the new location: https://github.com/obra/superpowers/blob/personal-superpower...
yoyohello13•34m ago
The post reads like the someone throwing bones and reading their fortune. That part where Claude did its own journaling was so cringe it was hilarious. The tone of the journal entry was exactly like the blog author, which suggests to me Claude is reflecting back what the author wants to hear. I feel like Jesse is consumed in a tornado of llm sycophancy.
saaaaaam•25m ago
Claude has never once said “oh shit” or “holy crap” to me. I must be doing something horribly wrong.