Would also be worth having special tokens for this kind of navigation.
Like most things - assume the "20/100/200" dollar deals that are great now are going to go down the enshitification route very rapidly.
Even if the "limits" on them stay generous, the product will start shifting to prioritize things the user doesn't want.
Tool recommendations are my immediate and near term fear - paid placement for dev tools both at the model level and the harness level seem inevitable.
---
The right route is open models and open harnesses, ideally on local hardware.
I don’t assume this at all. In fact, the opposite has been happening in my experience: I try multiple providers at the same time and the $20/month plans have only been getting better with the model improvements and changes. The current ChatGPT $20/month plan goes a very long way even when I set it to “Extra High” whereas just 6 months ago I felt like the $20/month plans from major providers were an exercise in bouncing off rate limits for anything non-trivial.
Inference costs are only going to go down from here and models will only improve. I’ve been reading these warnings about the coming demise of AI plans for 1-2 years now, but the opposite keeps happening.
This time also crosses over with the frontier labs raising ever larger and larger rounds. If Anthropic IPO (which I honestly doubt), then we may get a better sense of actual prices in the market, as it's unlikely the markets will continue letting them spend more and more money each year without a return.
Ultimately the market is going to force them to open up and let people flex their subs.
I’ll probably get downvoted for this, but am I the only one who thinks it’s kind of wild how much anger is generated by these companies offering discounted plans for use with their tools?
At this point, there would be less anger and outrage on HN if they all just charged us the same high per-token rate and offered no discounts or flat rate plans.
When I was reading the Opus 4.6 launch post, they mentioned the same thing and their TerminalBench score was based on using Terminus 2 and not CC.
0. https://mariozechner.at/posts/2025-11-30-pi-coding-agent/
read_toc tool:
...
{
"name": "mcp",
"qualified_name": "mcp",
"type": "constant",
"docstring": null,
"content_point": "src\\mcps\\code_help\\server.py::17::18::python::mcp",
"is_nested": false
},
{
"name": "handler",
"qualified_name": "handler",
"type": "constant",
"docstring": null,
"content_point": "src\\mcps\\code_help\\server.py::18::19::python::handler",
"is_nested": false
},
....update_content tool:
{
"content": "...",
"content_point": "src\\mcps\\code_help\\server.py::18::19::python::handler",
"project_root": ....
}With search-replace you could work on separate part of a file independently with the LLM. Not to mention with each edit all lines below are shifted so you now need to provide LLM with the whole content.
Have you tested followup edits on the same files?
You probably don't want to use the line number though unless you need to disambiguate
But your write tool implementation can take care of that
I don’t believe it’s exceptionally unique or new that companies will revoke access if you are using an unpublished API that the apps use. I don’t see anything wrong with it myself. If you want, pay for normal token use on the published APIs. There is no expectation that you can use APIs for an application, even if you are a paid user, that are not published explicitly for usage.
It's truly disgusting.
So then it's better to start obeying ROBOTS.txt as a ladder pull through a "nicely behaved" image advantage.
The alternative is to say that bugs shouldn’t be fixed because it’s a ladder pull or something. But that’s crazy. What’s the point of complaining if not to get people to fix things?
It’s because they want to study you.
They want the data!
Underscores the importance of sovereign models you can run on the edge, finetune yourself, and run offline. At State of Utopia, we're working on it!
* Subscriptions are oversubscribed. They know how much an “average” Claude Code user actually consumes to perform common tasks and price accordingly. This is how almost all subscription products work.
* There is some speculation that there is cooperative optimization between the harness and backend (cache related etc).
* Subscriptions are subsidized to build market share; to some extent the harnesses are “loss leader” halo products which drive the sales of tokens, which are much more profitable.
> Why bother, you ask? Opus may be a great model, but Claude Code to this day leaks raw JSONL from sub-agent outputs, wasting hundreds of thousands of tokens. I get to say, “fuck it, subagents output structured data now”.
This is why I find the banning of using Claude subscriptions in other harnesses is so heinous. Their harness that they're forcing onto everyone has tons of big issues including wasting massive numbers of tokens. Very much in line with intentionally refusing to adhere to standards in the most IE6 way possible.
>re "only" the harness changed
In our experience, AI's are like amnesiacs who can barely remember what they did three minutes ago (their last autonomous actions might still be in their context if you're lucky), with no chance at remembering what they did three days ago. As such, the "harness" determines their entire memory and is the single most important determinant of their outcome.
The best harness is a single self-contained, well-commented, obvious, and tiny code file followed by a plain explanation of what it does and what it's supposed to do, the change request, how you want it to do it (you have to say it with so much force and confidence that the AI is afraid of getting yelled at if they do anything else) and a large amount of text devoted to asking the AI not to break what is already working. Followed by a request to write a test that passes. Followed by asking for its judgment about whether it broke what was already working on or not. All in one tiny crisp prompt.
With such a harness, it's able to not break the code one time in twenty. If you use reverse psychology and ask it to do the opposite of what you want, it rises to fifty-fifty odds you'll get what you're trying to do.
Don't believe me? You can watch the livestream (see my previous comments).
Baby steps toward Utopia.
Agents waste a lot of tokens on editing, sandboxes, passing info back and forth from tool calls and subagents.
Love the pragmatic mix of content based addressing + line numbers. Beautiful.
With CC you can do a /cost to see how much your session cost in dollar terms, that's a good benchmark IMO for plugins, .md files for agents, and so on. Minimize the LLM cost in the way you'd minimize typical resource usage on a computer like cpu, ram, storage etc.
> Often the model isn’t flaky at understanding the task. It’s flaky at expressing itself. You’re blaming the pilot for the landing gear.
> The model is the moat. The harness is the bridge. Burning bridges just means fewer people bother to cross. Treating harnesses as solved, or even inconsequential, is very short-sighted.
> The gap between “cool demo” and “reliable tool” isn’t model magic. It’s careful, rather boring, empirical engineering at the tool boundary.
Seeing how bad the results are when you're casually approaching something makes it very evident that it's a topic that can be optimized.
How about Kimi tho how can I play with it?
energy123•1h ago
It's less token heavy than the proposed hash approach, and I don't think frontier LLMs hallucinate line numbers if each line in the context is prefixed with them.
withinboredom•1h ago
energy123•1h ago
kachapopopow•52m ago