The Bitter Lesson of LLM Extensions

https://www.sawyerhood.com/blog/llm-extension

144•sawyerjhood•2mo ago

Comments

dsign•2mo ago

I don't know, even ChatGPT 5.1 hallucinates API's that don't exist, though it's a step forward in that it also hallucinates the non existence of APIs that exist.

But I reckon that every time that humans have been able to improve their information processing in any way, the world has changed. Even if all we get is to have an LLM be right more times than it is wrong, the world will change again.

vessenes•2mo ago

  > "If I could short MCP, I would"

I mean, MCP is hard to work with. But there's a very large set of things that we want a hardened interface to out there - if not MCP, it will be something very like it. In particular, MCP was probably overly complicated at the design phase to deal with the realities of streaming text / tokens back and forth live. That is, it chose not to abstract these realities in exchange for some nice features, and we got a lot of implementation complexity early.

To quote the Systems Bible, any working complex system is only the result of the growth of a working simple system -- MCP seems to me to be right on the edge of what you'd define as a "working simple system" -- but to the extent it's all torn down for something simpler, that thing will inevitably evolve to allow API specifications, API calls, and streaming interaction modes.

Anyway, I'm "neutral" on MCP, which is to say I don't love it. But I don't have a better system in mind, and crucially, because these models still need fine-tuning to deal properly with agent setups, I think it's likely here to stay.

zby•2mo ago

MCP is another middleware story - this always fails (hat tip Benetict Evans).

robot-wrangler•2mo ago

I always see the hard/complex criticism but find it confusing.. what is the perceived difficulty with MCP at the implementation level? (I do understand the criticism about exhausting tokens with tool-descriptions and stuff, but that's a different challenge)

Doesn't seem like implementation could be more simple. Just JSON-RPC and API stuff. For example the MCP hello-world with python and FastMCP is practically 1-to-1 with a http/web flavored hello-world in flask

vessenes•2mo ago

There is a LOT under the surface. custom routes, bidirectional streaming choices (it started as a "local first" protocol). Implementing an endpoint from scratch is not easy, and the spec documentation moves very quickly, and generally doesn't have simple-to-digest updates for implementation.

I haven't looked in a few months, so my information might be a bit out of date, but at the time - if you wanted to use a python server from the modelcontextprotocol GitHub, fine. If you wanted to, say, build a proxy server in rust or golang, you were looking at a set of half-implemented server implementations targeting two-versions-old MCP specs while clients like claude obscure even which endpoints they use for discovery.

It's an immature spec, moderately complicated, and moving really quickly with only a few major 'subscribers' to the server side; I found it challenging to work with.

robot-wrangler•2mo ago

Well if your language of choice didn't have any good library support for HTTP, the web version of hello world would be hard too, but it would not say much about the protocol.

Even with these constraints the core MCP design is actually pretty good. First, use stdio transport, and now your language only needs to speak JSON [1]. Then, forget about building proxies and routers and web stuff, and offload that to mcpjungle [2] or similar to front your stdio work.

If that still doesn't work, I think I would probably wrap the foreign language with subprocs and retreat towards python's FastMCP (or whatever the well-supported and fast-moving stuff is in another language). Ugly but practical if you really must use a language with no good MCP support. If really none of that works I guess one is on the hook to support a changing MCP spec with a custom implementation in that language.. but isn't there maybe an argument now that MCP is complex because someone insisted on it being complex?

[1]: https://modelcontextprotocol.io/specification/2025-06-18/bas... [2]: https://github.com/mcpjungle/MCPJungle

vessenes•2mo ago

My use case was adding the ability to charge for MCP calls to remote MCP providers. This involves a “simple on paper” wrap, proxy, insert tools on the proxy/charging server side. A number of the paradigms you mention just aren’t great, e.g. stdio over http doesn’t work (and I’ll reference you to the lengthy GitHub issues conversations at the MCP GitHub about how they want to support it when the server is not local), and in fact MCP over TCP is just literally months old. Anyway, like I said, if you’re on a golden path that tracks the monorepo delivered by the spec folks, I agree with you, it works pretty well.

For reference, I think writing an MCP proxy layer in (lang of choice) is significantly harder than writing something to respond to GET / over http, both in complexity of what clients need out of a server (web clients are hardened to deal with all kinds of bad behavior), and in the amount of stuff you actually need to write, and also in the lack of documentation.

vidarh•2mo ago

The thing is, MCP is little more than another self-descriping API format, and current models can handle most semi-regular API's with just a description and basic tooling. I had Claude interact with my app server via Curl before I decided to just tell it to write an API client instead. I could have told it to implement MCP instead, but now I have a CLI client that I can use as well, and Claude happily uses it with just the --help options.

If you don't already have an API, sure, MCP is a possible choice for that API. But if you have an API, there is decreasing reasons to bother implementing an MPC server the smarter the models are getting vs. just giving it access to your API docs.

btbuildem•2mo ago

MCP came in a bit too early, when the conceptual shift of hadn't fully kicked in yet. I see it as a bit of a Horseless Carriage, and I think Skills came in to counter that. My sense is that this will settle into a sort of self-assembling code golem, where ambiguous parts are handled in LLM-space, and clear, well-defined things are handled in code-space.

mkagenius•2mo ago

> Skills are the actualization of the dream that was set out by ChatGPT Plugins .. But I have a hypothesis that it might actually work now because the models are actually smart enough for it to work.

and earlier Simon Willison argued[1] that Skills are even bigger deal than MCP.

But I do not see as much hype for Skills as it was for MCP - it seems people are in the MCP "inertia" and having no time to shift to Skills.

1. https://simonwillison.net/2025/Oct/16/claude-skills/

zby•2mo ago

I still don't get what is special about the skills directory - since like forever I instructed Claud Code - "please read X and do Y" - how skills are different from that?

mkagenius•2mo ago

The difference is that the code in the directory (and the markdown) are hardcoded and known to work beforehand.

munk-a•2mo ago

But we are still reliant on the LLM correctly interpreting the choice to pick the right skill. So "known to work" should be understood in the very limited context of "this sub-function will do what it was designed to do reliably" rather than "if the user asks to use this sub-function it will do was it was designed to do reliably".

Skills feel like a non-feature to me. It feels more valuable to connect a user to the actual tool and let them familiarize themselves with it (and not need the LLM to find it in the future) rather than having the tool embedded in the LLM platform. I will carve out a very big exception of accessibility here - I love my home device being an egg timer - it's a wonderful egg timer (when it doesn't randomly play music) and I could buy an egg timer but having a hands-free egg timer is actually quite valuable to me while cooking. So I believe there is real value in making these features accessible through the LLM over media that the feature would normally be difficult to use in.

mkagenius•2mo ago

Choice to pick right tool -- there is a benchmark which tracks the accuracy of this.

"Known to work" -- if it has a hardcoded code, it will work 100% of the time - that's the point of Skills. If it's just markdown then yes, some sort of probability will be there and it will keep on improving.

vidarh•2mo ago

This is no different to an MCP, where you rely on the model to use the metadata provided to pick the right tool, and understand how to use it.

Like with MCP, you can provide a deterministic, known-good piece of code to carry out the operation once the LLM decides to use it.

But a skill can evolve from pure Markdown via inlining some shell commands, up to a large application. And if you let it, with Skills the LLM can also inspect the tool, and modify it if it will help you.

All the Skills I use now have evolved bit by bit as I've run into new use-cases and told Claude Code to update the script the skills references or the SKILL.md itself. I can evolve the tooling while I'm using it.

bavell•2mo ago

Not really special, just officially supported and I'm guessing how best to use it baked in via RL. Claude already knows how skills work vs learning your own home-rolled solution.

simonw•2mo ago

They're not. They are just a formalization of that pattern, with a very tiny extra feature where the model harness scans that folder on startup and loads some YAML metadata into the system prompt so it knows which ones to read later on.

lupire•2mo ago

So "skills" are a hack around the LLM not actually being very smart? Interesting.

vidarh•2mo ago

It's more that they are embracing that the LLM is smart enough that you don't need to build-in this functionality beyond that very minimal part.

A fun thing: Claude Code will sometimes fail to find the skill the "proper" way, and will then in fact sometimes look for the SKILL.md file with tools, and read the file with tools, showing that it's perfectly capable of doing all the steps.

You could probably "fake" skills pretty well with instructions in CLAUDE.md to use a suitable command to extract the preamble of files in a given directory, and tell it to use that to decide when to read the rest.

It's the fact that it's such a thin layer that is exciting - it means we need increasingly less special logic other than relying on just basic instructions to the model itself.

simonw•2mo ago

Everything we do with LLMs is a hack around them not actually being very smart!

Working around their many limitations has been the nature of the game since the original GPT-3.

conception•2mo ago

More not wasting context having it figure it out.

It’s documentation vs researching how to do something.

Glemkloksdjf•2mo ago

No, skills are a set of manifested and tested 'skills' which reduce the 'mental load' of the LLM and reduces the context the LLM needs to do things reproducable.

Similiar to what humans do.

CuriouslyC•2mo ago

Skills do something you could already do with folder level readme files and hyperlinks inside source, but in a vendor-locked-in way. Not a fan.

mkagenius•2mo ago

It's definitely not vendor locked. For instance, I have made it work with Gemini with Open-Skills[1].

It is after all a collection of instructions and code that any other llm can read and understand and then do a code execution (via tool call / mcp call)

1. Open-Skills: https://github.com/BandarLabs/open-skills

pluralmonad•2mo ago

They are just text files though. I'm sensitive to vendor lock-in and do not perceive a standard folder structure and bare text files to be that.

bavell•2mo ago

Yeah, the reason I like Skills better than MCP is specifically because skills are just plain text.

vidarh•2mo ago

Skills are just markdown files in a folder that any agent that can read files can figure out.

Just tell your non-Claude agent to read your skills directory, and extract the preambles.

robot-wrangler•2mo ago

Skills are like the "end-user" version of MCP at best, where MCP is for people building systems. Any other point of view raises a lot of questions.

Aren't skills really just a collection of tagged MCP prompts, config resources, and tools, except with more lock-in since only Claude can use it? About that "agent virtual environment" that runs the scripts.. how is it customized, and.. can it just be a container? Aren't you going to need to ship/bundle dependencies for the tools/libraries those skills require/reference, and at that point why are we avoiding MCP-style docker/npx/uvx again?

Other things that jump out are that skills are supposed to be "composable", yet afaik it's still the case that skills may not explicitly reference other skills. Huge limiting factors IMHO compared to MCP servers that can just use boring inheritance and composition with, you know, programming languages, or composition/grouping with namespacing and such at the server layer. It's unclear how we're going to extend skills, require skills, use remote skills, "deploy" reusable skills etc etc, and answering all these questions gets us most of the way back to MCP!

That said, skills do seem like a potentially useful alternate "view" on the same data/code that MCP is covering. If it really catches on, maybe we'll see skill-to-MCP converters for serious users that want to be able do the normal stuff (like scaling out, testing in isolation, doing stuff without being completely attached to the claude engine forever). Until there's interoperability I personally can't see getting interested though

vidarh•2mo ago

There's no lock-in there.

Tell your agent of choice to read the preamble of all the documents in the skills directory, and tell it that when it has a task that matches one of the preambles, it should read the rest of the relevant file for full instructions.

There are far fewer dependencies for skills than for MCP. Even a model that knows nothing about tool use beyond how to run a shell command, and has no support for anything else can figure out skills.

I don't know what you mean regarding explicitly referencing other skills - Claude at least is smart enough that if you reference a skill that isn't even properly registered, it will often start using grep and find to hunt for it to figure out what you meant. I've seen this happen regularly while developing a plugin and having errors in my setup.

robot-wrangler•2mo ago

> There are far fewer dependencies for skills than for MCP.

This is wrong and an example magical thinking. AI obviously does not mean that you can ship/use software without addressing dependencies? See for example https://github.com/anthropics/skills/blob/main/slack-gif-cre... or worse, the many other skills that just punt on this and assume CLI tools and libraries are already available

vidarh•2mo ago

It is categorically not wrong. With an MCP you have at a minimum all the same dependencies and on top of that a dependency on your agent supporting MCP. With skills, a lot of the time you don't need to ship code at all - just an explanation to the agent of how to use standard tools to access an API for example, but when you do need to ship code, you don't need to ship any more code than with an MCP.

The trivial evidence of this, is that if you have an MCP server available, the skill can simply explain to the agent how to use the MCP server, and so even the absolute worst case for skills is parity.

fzysingularity•2mo ago

I definitely see the value and versatility of Claude Skills (over what MCP is today), but I find the sandboxed execution to be painfully inefficient.

Even if we expect the LLMs to fully resolve the task, it'll heavily rely on I/O and print statements sprinkled across the execution trace to get the job done.

mkagenius•2mo ago

> but I find the sandboxed execution to be painfully inefficient

sandbox is not mandatory here. You can execute the skills on your host machine too (with some fidgeting) but it's a good practice and probably for the better to get in to the habit of executing code in an isolated environment for security purposes.

munk-a•2mo ago

The better practice is, if it isn't a one-off, being introduced to the tool (perhaps by an LLM) and then just running the tool yourself with structured inputs when it is appropriate. I think the 2015 era novice coding habit of copying a blob of twenty shell scripts off of stack overflow and blindly running them in your terminal (while also not good for obvious reasons) was better than that essentially happening but you not being able to watch and potentially learn what those commands were.

fzysingularity•2mo ago

I do think that if the agents can successfully resolve these tasks in a code execution environment, it can likely come up with better parametrized solutions with structured I/O - assuming these are workflows we want to run over and over again.

sawyerjhood•2mo ago

I agree with you. I don't see people hyping them and I think a big part of this is that we have sort of hit an LLM fatigue point right now. Also Skills require that your agent can execute arbitrary code which is a bigger buy-in cost if your app doesn't have this already.

vidarh•2mo ago

Skills are less exciting because they're effectively documentation that's selectively loaded.

They are a bigger deal in a sense because they remove the need for all the scaffolding MCPs require.

E.g. I needed Claude to work on transcripts from my Fathom account, so I just had it write a CLI script to download them, and then I had it write a SKILL.md, and didn't have to care about wrapping it up into an MCP.

At a client, I needed a way to test their APIs, so I just told Claude Code to pull out the client code from one of their projects and turn it into a CLI, and then write a SKILL.md. And again, no need to care about wrapping it up into an MCP.

But this seems a lot less remarkable, and there's a lot less room to build big complicated projects and tooling around it, and so, sure, people will talk about it less.

stingraycharles•2mo ago

Skills are good for context management as everything that happens while executing the skill remains “invisible” to the parent context, but they do inherit the parent context. So it’s pretty effective for a certain set of problems.

MCP is completely different, I don’t understand why people keep comparing the two. A skill cannot connect to your Slack server.

Skills are more similar to sub-agents, the main difference being context inheritance. Sub-agents enable you to set a different system prompt for those which is super useful.

ako•2mo ago

Are you sure, i thought skill were loaded into the main context, unlike (sub)agents. According to Claude they're loaded into the main context. Do you have link?

stingraycharles•2mo ago

No, just their header / when they should be invoked, the actual contents of the skill is never loaded in the main context.

ako•2mo ago

Unless claude decides a skill is needed, then it loads the additional details into the main context to use. It's basically lazy loading into main context.

vidarh•2mo ago

A skill can absolutely connect to your slack server. Either by describing how to use standard tools to do so, or by including code.

Most of my skills connect to APIs.

j2kun•2mo ago

I don't see how "they improved the models" is related to the bitter lesson. You are still injecting human-level expertise (whether it is by prompts or a structured API) to compensate for the model's failures. A "bitter lesson" would be that the model can do better without any injection, but more compute power, than it could with human interference.

idle_zealot•2mo ago

> A "bitter lesson" would be that the model can do better without any injection, but more compute power, than it could with human interference.

This is what I expected the post to be about before clicking.

marshall300791•2mo ago

The bitter lesson here is that it all goes back to natural language rather than lower level of abstractions

j2kun•2mo ago

I would contest that this is not a "bitter lesson" in the sense that it has not been demonstrated repeatedly over decades as a truism of computer science.

zby•2mo ago

I believe that what we need is treating prompts as stochastic programs and using a special shell for calling them. Claude Code and Codex and other coding agents are like that - now everybody understands that they are not just coding assistants they are a general shell that can use LLM for executing specs. I would like to have this extracted from IDE tools - this is what I am working on in llm-do.

Der_Einzige•2mo ago

Zero discussion around LLM sampling. How do you leave such a gaping hole in such a written piece? I know it's not AI cus AI wouldn't be that sloppy.

ttkciar•2mo ago

Local inference users are all about sampling, but users addicted to commercial inference services are wary of sampling, because they have to pay by the token.

jdblair•2mo ago

a funny thing happened why i added and emacs eval mcp tool to claude code

https://hachyderm.io/@jdblair/115605988820465712

lerp-io•2mo ago

can someone explain to me the difference between MCP and calling a cli tool eg curl or whatever i still don’t understand i’ve been using ai for years now.

virajk_31•2mo ago

MCP is tool calling with continued context/rich context, tool calling alone will PROBABLY die after single call whereas MCP keeps continuity by design (You can use MCP for tool calling but not vice versa). Hope this help you understand.

lerp-io•2mo ago

no still doesn't make sense lmao. you call api and get output, no?

bloppe•2mo ago

How is this related to the bitter lesson?

ttkciar•2mo ago

The author speculates that bigger/smarter models interpreting vague directives to utilize general-function tools will outperform more precise and detailed directives to utilize narrow-function tools:

> Granted to use a skill the agent needs to have general purpose access to a computer, but this is the bitter lesson in action. Giving an agent general purpose tools and trusting it to have the ability to use them to accomplish a task might very well be the winning strategy over making specialized tools for every task.

uriegas•2mo ago

I was thinking the same thing. Maybe is that at the end the author seems to imply that agentic AI will work simply because models have become better regardless of the way we make them agentic (i.e. MCPs, skills, etc).

ttkciar•2mo ago

Well, that's just great.

The academic community has been using the term "skill" for years, to refer to classes of tasks at which LLMs exhibit competence.

Now OpenAI has usurped the term to refer to these inference-guiding .md files.

I'm not looking forward to having to pick through a Google hit list for "LLM skills", figuring out which publications are about skills in the traditional sense and which are about the OpenAI feature. Semantic overload sucks.

How do we deal with this? Start using "competencies" (or similar) in academic papers? Or just resign ourselves to suffering the ambiguity?

Or maybe the OpenAI feature will fall flat and nobody will talk about it at all. That would frankly be the best outcome.

lupire•2mo ago

The way NNs and LLMs solve this problem is by processing context and activating middle layer nodes to disambiguate local ambiguities. Have you tried increasing your context window?

jswny•2mo ago

Skills are an Anthropic feature

ttkciar•2mo ago

Whoops, you are right. I misread the article.

touristtam•2mo ago

What about open? Or ai? Neither is really what they are offering. Open they are not (weight doesn't count) and don't get me started on that statistical machine they call artificial intelligence. Misleading through and through.

WatchDog•2mo ago

The most useful LLM "extension" isn't even mentioned in this article, and that is shell use.

An LLM with a shell integration can do anything you need it to.

AIorNot•2mo ago

A man with a spoon can dig a swimming pool but Id prefer a backhoe

btbuildem•2mo ago

sudo apt-get install backhoe

touristtam•2mo ago

mise use -g backhoe

uriegas•2mo ago

> "I expect us to go back to extending our agents with the most accessible programming language: natural language."

I don't agree with this. Natural language is so ambiguous. At least for software development the hard work is still coming up with clearly defined solutions. There is a reason for why math has its own domain specific language.

euroderf•2mo ago

> Natural language is so ambiguous.

As a former tech comms guy I will say:

Natural language can be bent into arbitrary precision. Write something, then enter a read-rewrite-reread loop as the devil's advocate (this is key) until it stops being ambiguous or having multiple conceivable interpretations.

Yes with English this process can be a pain in the butt, until you get the hang of it.

stavros•2mo ago

The problem is that it's very hard to anticipate all possible edge cases. Programming languages force you to do a lot of that work up front, English doesn't. It's the difference between writing Javascript and writing Typescript, except orders of magnitude worse.

thfuran•2mo ago

You’re never going to make a nontrivial statement in English that you couldn’t find two people who wouldn’t perfectly agree on its meaning. Or probably even a trivial one. Sure, at some point you can say “no, you’re clearly misinterpreting what I’ve said” or “you’re inferring something that wasn’t implied”, but English doesn’t have a formal spec or a reference implementation, so that’s kind of meaningless.

didroe•2mo ago

The problem is, what's ambiguous or precise is subjective. Your devil's advocate needs to reflect all of the possible readers, and that isn't possible.

There's a good reason we use jargon in professions, or more constrained and less ambiguous languages for maths/coding

jgalt212•2mo ago

Legalese? It can be both precise and ambiguous, depending on both its construction and the reader's aptitude and comprehension.

n4pw01f•2mo ago

This process can be handled by a “turn server”

Was a pain to set up, but you can score the context completion and then if the score is under 98% or something, “ask” clarifying questions of the requesting agent or person or system

euroderf•2mo ago

Kagi and Perplexity are drawing a blank on "turn server".

n4pw01f•2mo ago

I guess we just call them turn servers but technically it’s a “Stateful Dialogue Manager”

zbyforgotp•2mo ago

That’s why we need progressive hardening: https://github.com/zby/llm-do/blob/main/docs/concept_spec.md

arjie•2mo ago

Custom GPTs are pretty old, but I recently found a use for them. My wife wanted some meeting note-taking and task recording assistance and I found that making a Custom GPT with a trivial Notion API that was scoped to one page[0] with structure that was encoded in the API was a quick couple-hour thing that unlocked a lot of utility for her (the default Notion MCP is "too broad"). It helped that this Custom GPT sits in her ChatGPT UI and she doesn't have to have another app or whatever to make it work.

We liked it quite a bit, but it led to some funny things. We use Reminders to keep our home to-do lists, hers and mine in one list with two sections. I wanted to take this existing flow we had and make it work with a Custom GPT. It's practically impossible because Reminders:

* doesn't have a good API through EventKit

* requires a pop-up permission grant in the UI

So in the end, I did end up making somewhat of an MCP server for it, running it on an old Macbook Pro I had and then sticking Amphetamine on in closed-lid display-sleep mode hooked up to my Tailnet and exposed via a Cloudflare tunnel so that we could use ChatGPT to interact with the thing. Yes, you can see how insane that whole thing is. But there's quite a lot of value to have your AI agent just be the one thing.

0: https://wiki.roshangeorge.dev/w/Blog/2025-10-17/Custom_GPTs

metalrain•2mo ago

Skills.md will in time have same problem as MCP, they will bloat the context. I wonder if we could just have the scripts without the descriptions and LLM would have been trained to search the most useful things in specific folder.

crackalamoo•2mo ago

This seems like a solvable engineering problem. For example, you could have a lightweight subagent with its own context for reading the skills and determining which to use

iamcreasy•2mo ago

ChatGPT apps, announced this month, feels a lot like original ChatGPT Plugin announced 3 years back. The only difference is how plugin are invoked. For ChatGPT plugin, we have to choose one from a drop down, and for apps - we could just include a plugin name in prompt.

Is there any other difference in the end-user side?

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

Cook New Emojis

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger

The Path to Mojo 1.0

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

Cook New Emojis

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger

The Path to Mojo 1.0

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

The Bitter Lesson of LLM Extensions

Comments