I own a stake in a small brewery in Canada, and this feature just saved me setting up some infrastructure to "productionize" an agent we created to assist with ordering, invoicing, and government document creation.
I get paid in beer and vibes for projects like these, so the more I can ship these projects in the same place I prototype them the better.
(Also don't worry all, still have SF income to buy food for my family with)
quick question, how do you manage these side projects that kinda need to be production ready but aren't you are actual SF job lol?
some of these people think they are my actual customer/client but like i do it for fun and to help them out.
However, Vivaldi on Linux renders it correctly, so I assume that it probably works right only in Chrome and completely compatible browsers.
Unfortunately, there are still a lot of Web sites that work right in a single browser, usually Chrome, so I always must keep around at least 2 or 3 different browsers.
But beyond that, AWS is a very complex platform. Agents simplify saas, the agent itself manages the api calls, maybe the database queries, more of the logic. As software moves into the agent, you need less cloud capability, and a better agent harness/hosting. Essentially, this makes the AWS platform obsolete, most services make much less sense.
Originally I thought they would stick towards being a model provider mainly, but with all the recent releases it seems they do want to provide more "services."
Wonder what part of the market 3rd party apps will build a moat around?
1. We pay for saas, so we don't have to manage it. If you vibe-code or use these AI things, then you are managing it yourself.
2. Most Saas is like $20-$100/month/person for most Saas. For a software engineer, that maybe <1h of pay.
3. Most Saas require some sort of human in the loop to check for quality (at least sampling). No users would want to do that.
Number 2 is the biggest reason. It's $20 a month.... I'm not gonna replace that with anything.
Writing this message already costs more than $20 of my time.
I predict that the market will get bigger because people are more prone to automate the long-tail/last-mile stuff since they are able to
I can see that, assuming models don't make some giant leap forward.
> 2. Most Saas is like $20-$100/month/person for most Saas. For a software engineer, that maybe <1h of pay.
|Segment |Median Enterprise Price |
|--------------------------|------------------------------------------|
|Mid-market |~$175/user/month |
|Enterprise (<100 seats) |~$470/seat/month implied (~$47K ACV) |
|Enterprise (100-500 seats)|~$312–$1,560/seat/month range (~$156K ACV)|
Enterprise contracts almost always include a platform fee on top of per-seat costs (67% of contracts), plus professional services that add 12–18% of first-year revenue.So for a lot of companies, it's worth using AI to create a replacement.
I'll add the nuance that those might be big companies with slack capacity, or at least firms that already are at a point in their effort/performance curve where marginal effort injections in their core business are not worthy enough (a point that, without being big companies, would be actually weird). Even with AI and as processes become more efficient effort is at premium, and depending on your firm situation an man-hour used in your business might be a better use of effort and time that using it on non-core services.
There's a lot of money to be made in small business automation right now.
To score a big IPO they need to be a platform, not just a token pipeline. Everything they’re doing signals they’re moving in this direction.
But you should correct: Claude is very happy to let you use whatever you want for a harness ... as long as you're on a pay-as-you-go plan. So it's not blocked. It's just not allowed on the $20 per month plan.
First, harnesses can give access to company internal tools (like the ticket queue). You could do this with MCP, but it's much harder, slower and it kind of resists doing this (if you want a bot to solve a ticket, why not start with an entire overview of the ticket in the first request to your model? This can't easily be done with MCP)
Second harnesses can direct the whole process. A trivial example is that you can improve performance in a very simple way: ask "are you sure?" showing the model what it intends to do, BEFORE doing it. Improves performance by 10%, right there. Give a model the chance to look at what it's doing and change it's mind before committing. Then ask a human the same question, with a nice yes/no button. Try that with MCP.
Of course you quickly find a million places to change the process and then you can go and meta-change the process. Like asking an AI what steps should be followed first, then do those steps, most of whom are AI invocations with parts of the tickets (say examine the customer database, extract what's relevant to this problem, ...). Limiting context is very powerful, and not just because it gets you cheaper requests. Get an AI to make relevant context for a particular step before actually doing that step ...
Put it into the "are you sure" loop and you'll see the model will just keep oscillating for eternity. If you ask the model to question the output, it will take it as an instruction that it must, even if the output is correct.
FWIW- IMO, being locked into a single model provider is a deal breaker.
This solution will distract a lot of folks and doom-lock them into Anthropic. That’ll probably be fine for small offices, but it is suicidal to get hooked into Anthropic’s way of doing things for anything complex. IME, you want to be able to compare different models and you end up managing them to your style. It’s a bit like cooking- where you may have greater affinity for certain flavors. You make selection tradeoffs on when to use a frontier model on design & planning vs something self hosted for simpler operations tasks.
Which projects are standing out in this space right now?
It works on top of k8s, so you can deploy and run in your own compute cluster. Right now it's focused only on coding tasks but I'm currently working on abstractions so you can similarly orchestrate large runs of any agentic workflow.
I do have some features coming up that will improve the ability to converse with the agent as it's running. I'll make a note to add in a plan setting so you can have that run and converse before it gets going.
When the models have an off day, the workflows you’ve grown to depend upon fail. When you’re completely dependent on Anthropic for not only execution but troubleshooting- you’re doomed. You lose a whole day troubleshooting model performance variability when you should have just logged off and waited. These are very cognitively disruptive days.
Build in multi-model support- so your agents can modify routing if an observer discovers variability.
Call me stupid, but this sounds not like they want software developers to be around in a year or two.
Until then, every agent framework is completely reinvented every week due to new patterns and new models. evals, ReACT, DSPy, RLM, memory patterns, claws, dynamic context, sandbox strategies. It seems like locking in to a framework is a losing proposition for anyone trying to stay competitive. See also: LangChain trying to be the Next.js/Vercel of agents but everyone recommending building your own.
That said, Anthropic pulls a lot of weight owning the models themselves and probably an easier-to-use solution will get some adoption from those who are better served by going from nothing to something agentic, despite lock-in and the constant churn of model tech
That plus everyone is using 5 different vector DBs and reranking models from different vendors than the answer models etc.
Framework is simply way too rigid for a non-deterministic technology.
We may see libraries that provide tools for managing agents, but then again, there's nothing that tmux can't do already.
I agree a framework is something that sounds outdated.
I also believe an orchestrator is needed. Something that abstracts you from a specific provider. Like hardware, drivers and operating systems.
Right now, my thoughts are on that line: Who will build that operating system? Who will have it in the cloud?
It needs to be robust to operate for large organizations, open source, and sit on top of any provider.
Right now we are seeing BSD vs GNU/Linux vs DOS kind of battles.
We've got Claude Managed Agents, Claude Agent SDK, Claude API, Claude Code, Claude Platform, Claude Cowork, Claude Enterprise, and plain old 'Claude'. And honourable mention to Claude Haiku/Sonnet/Opus 4.{whatever} as yet another thing with the same prefix. I feel like it's about once a week I see a new announcement here on HN about some new agentic Claude whatever-it-is.
I have pretty much retreated in the face of this to 'just the API + `pi` + Claude Opus 4.{most recent minor release}', as a surface area I can understand.
Edit: confirmed, loads with a public DNS provider that has no blocklists.
The best performance I've gotten is by mixing agents from different companies. Unless there is a "winner take all" agent (I seriously doubt it, based on the dynamics and cost of collecting high quality RL data), I think the best orchestration systems are going to involve mixing agents.
Here, it's not about the planner, it's about the workers. Some agents are just better at certain things than others.
For instance, Opus 4.6 on max does not hold a candle to GPT 5.4 xhigh in terms of bug finding. It's just not even a comparison, iykyk.
Almost analogous to how diversity of thought can improve the robustness of the outcomes in real world teams. The same thing seems to be true in mixture-of-agent-distributions space.
Having Opus write a spec, then send to Gemini to revise, back to Opus to fix, then to me to read and approve..
Send to a local model like Qwen3.5 to build, then off to Opus to review ...
This was such an amazing flow, until Anthropic decided to change their minds.
Gemini cli is horrible though.
For Anthropic to have the best version of this software, they'd have to simultaneously ... well, have the best version of the software, but also beat every other AI company at all subtasks (like: technical writing, diagramming, bug finding -- they'd need to have the unequivocal "best model" in all categories).
Surely their version is not going to allow you to e.g. invoke Codex or what have you as part of their stack.
(Not sure if it would be Sumerian, Esperanto or something more artificial. As long as it is esoteric enough for one company to hoard all the expertise in it.)
I also remember chinese being discussed as a potential orchestrating language but I don't remember the sources, so 100% anecdotical.
So, that'll go on until they form a cartel and become the wizard of oz.
Opus is designed to be lazy, corner-cutting model. Reviews are just one place where this shows. In my orchestration loop, opus discards many findings by GPT 5.4 xhigh, justifying this as pragmatism. Opus YAGNIs everything, GPT wants you to consider seismic events in your todo list app. There's sadly, nothing in between.
In other words, it is designed for companies to build on top of the Anthropic platform. Fo example, you are a SaaS and you want to build a way of running agents programatically for your customers, they basically offer a solution. It is not for personal use although you can certainly do so if you are prepared to pay the price for the API.
The downside is obviously this is locked to Anthropic models.
The other downsides is that the authentication story at the moment is underwhelming, hacking, and dare I say, insecure. I have a few reservations.
We already have this platform and I am putting together and open-source example how to create your own version of this.
Anthropic models are great but there are plenty of open-source models too and frankly agents do not need to run like claude code in order to be successful at whatever they need to do. The agent architecture entirely depends on the problem domain in my own experience.
lifecodes•1d ago