Opus 4.7 to 4.6 Inflation is ~45%

https://tokens.billchambers.me/leaderboard

122•anabranch•1h ago

Comments

anabranch•1h ago

I wanted to better understand the potential impact for the tokenizer change from 4.6 and 4.7.

I'm surprised that it's 45%. Might go down (?) with longer context answers but still surprising. It can be more than 2x for small prompts.

pawelduda•49m ago

Not very encouraging for longer use, especially that the longer the conversation, the higher the chance the agent will go off the rails

someuser54541•54m ago

Should the title here be 4.6 to 4.7 instead of the other way around?

freak42•53m ago

absolutely!

UltraSane•53m ago

Writing Opus 4.6 to 4.7 does make more sense for people who read left to right.

pixelatedindex•40m ago

I’m impressed with anyone who can read English right to left.

einpoklum•6m ago

Right to Left English - read can, who? Anyone with [which] impressed am I.

embedding-shape•40m ago

But the page is not in a language that should be read right to left, doesn't that make that kind of confusing?

usrnm•21m ago

Did you mean "right to left"?

embedding-shape•14m ago

I very much did, it got too confusing even for me. Thanks!

bee_rider•30m ago

Err, how so?

therobots927•52m ago

Wow this is pretty spectacular. And with the losses anthro and OAI are running, don’t expect this trend to change. You will get incremental output improvements for a dramatically more expensive subscription plan.

falcor84•47m ago

Indeed, and if we accept the argument of this tech approaching AGI, we should expect that within x years, the subscription cost may exceed the salary cost of a junior dev.

To be clear, I'm not saying that it's a good thing, but it does seem to be going in this direction.

dgellow•23m ago

If LLMs do reach AGI (assuming we have an actual agreed upon definition), it would make sense to pay way more than a junior salary. But also, LLMs won’t give us AGI (again, assuming we have an actual, meaningful definition)

justindotdev•49m ago

i think it is quite clear that staying with opus 4.6 is the way to go, on top of the inflation, 4.7 is quite... dumb. i think they have lobotomized this model while they were prioritizing cybersecurity and blocking people from performing potentially harmful security related tasks.

vessenes•47m ago

4.7 is super variable in my one day experience - it occasionally just nails a task. Then I'm back to arguing with it like it's 2023.

aenis•29m ago

My experience as well, unfortunately. I am really looking forward to reading, in a few years, a proper history of the wild west years of AI scaling. What is happening in those companies at the moment must be truly fascinating. How is it possible, for instance, that I never, ever, had an instance of not being able to use Claude despite the runaway success it had, and - i'd guess - expotential increase in infra needs. When I run production workloads on vertex or bedrock i am routinely confronted with quotas, here - it always works.

dgellow•21m ago

That has been my Friday experience as well… very frustrating to go back to the arguing, I forgot how tense that makes me feel

bcherny•39m ago

Hey, Boris from the Claude Code team here. People were getting extra cyber warnings when using old versions of Claude Code with Opus 4.7. To fix it, just run claude update to make sure you're on the latest.

Under the hood, what was happening is that older models needed reminders, while 4.7 no longer needs it. When we showed these reminders to 4.7 it tended to over-fixate on them. The fix was to stop adding cyber reminders.

More here: https://x.com/ClaudeDevs/status/2045238786339299431

bakugo•34m ago

How do you justify the API and web UI versions of 4.7 refusing to solve NYT Connections puzzles due to "safety"?

https://x.com/LechMazur/status/2044945702682309086

templar_snow•18m ago

To be fair, reading the New York Times is a safety risk for any intelligent life form these days. But still.

coldtea•48m ago

This, the push towards per-token API charging, and the rest are just a sign of things to come when they finally establish a moat and full monoply/duopoly, which is also what all the specialized tools like Designer and integrations are about.

It's going to be a very expensive game, and the masses will be left with subpar local versions. It would be like if we reversed the democratization of compilers and coding tooling, done in the 90s and 00s, and the polished more capable tools are again all proprietary.

throwaway041207•41m ago

Yep, between this and the pricing for the code review tool that was released a couple weeks ago (15-25 a review), and the usage pricing and very expensive cost of Claude Design, I do wonder if Anthropic is making a conscious, incremental effort to raise the baseline for AI engineering tasks, especially for enterprise customers.

You could call it a rug pull, but they may just be doing the math and realize this is where pricing needs to shift to before going public.

zozbot234•29m ago

There's been speculation that the code review might actually be Mythos. It would seem to explain the cost.

quux•41m ago

If only there were an Open AI company who's mandate, built into the structure of the company, were to make frontier models available to everyone for the good of humanity.

Oh well

slowmovintarget•35m ago

Things used to be better... really.

OpenAI was built as you say. Google had a corporate motto of "Don't be evil" which they removed so they could, um, do evil stuff without cognitive dissonance, I guess.

This is the other kind of enshitification where the businesses turn into power accumulators.

ai_slop_hater•48m ago

Does anyone know what changed in the tokenizer? Does it output multiple tokens for things that were previously one token?

quux•40m ago

It must, if it now outputs more tokens than 4.6's tokenizer for the same input. I think the announcement and model cards provide a little more detail as to what exactly is different

ausbah•42m ago

is it really unthinkable that another oss/local model will be released by deepseek, alibaba, or even meta that once again give these companies a run for their money

pitched•40m ago

Now that Anthropic have started hiding the chain of thought tokens, it will be a lot harder for them

zozbot234•31m ago

Anthropic and OpenAI never showed the true chain of thought tokens. Ironically, that's something you only get from local models.

amelius•40m ago

I'm betting on a company like Taalas making a model that is perhaps less capable but 100x as fast, where you could have dozens of agents looking at your problem from all different angles simultaneously, and so still have better results and faster.

andai•7m ago

Yeah, it's a search problem. When verification is cheap, reducing success rate in exchange for massively reducing cost and runtime is the right approach.

embedding-shape•39m ago

Nothing is unthinkable, I could think of Transformers.V2 that might look completely different, maybe iterations on Mamba turns out fruitful or countless of other scenarios.

slowmovintarget•38m ago

Qwen released a new model the same day (3.6). The headline was kind of buried by Anthropic's release, though.

https://news.ycombinator.com/item?id=47792764

zozbot234•33m ago

> is it really unthinkable that another oss/local model will be released by deepseek, alibaba, or even meta that once again give these companies a run for their money

Plenty of OSS models being released as of late, with GLM and Kimi arguably being the most interesting for the near-SOTA case ("give these companies a run for their money"). Of course, actually running them locally for anything other than very slow Q&A is hard.

kalkin•41m ago

AFAICT this uses a token-counting API so that it counts how many tokens are in the prompt, in two ways, so it's measuring the tokenizer change in isolation. Smarter models also sometimes produce shorter outputs and therefore fewer output tokens. That doesn't mean Opus 4.7 necessarily nets out cheaper, it might still be more expensive, but this comparison isn't really very useful.

manmal•38m ago

Why is it not useful? Input token pricing is the same for 4.7. The same prompt costs roughly 30% more now, for input.

kalkin•35m ago

That's valid, but it's also worth knowing it's only one part of the puzzle. The submission title doesn't say "input".

dktp•33m ago

The idea is that smarter models might use fewer turns to accomplish the same task - reducing the overall token usage

Though, from my limited testing, the new model is far more token hungry overall

manmal•11m ago

Well you‘ll need the same prompt for input tokens?

h14h•29m ago

For some real data, Artificial Analysis reported that 4.6 (max) and 4.7 (max) used 160M tokens and 100M tokens to complete their benchmark suite, respectively:

https://artificialanalysis.ai/?intelligence-efficiency=intel...

Looking at their cost breakdown, while input cost rose by $800, output cost dropped by $1400. Granted whether output offsets input will be very use-case dependent, and I imagine the delta is a lot closer at lower effort levels.

SkyPuncher•3m ago

Yes. I actually noticed my token usage go down on 4.6 when I started switching every session to max effort. I got work done faster with fewer steps because thinking corrected itself before it cycled.

I’ve noticed 4.7 cycling a lot more on basic tasks. Though, it also seems a bit better at holding long running context.

fny•41m ago

I'm going to suggest what's going on here is Hanlon's Razor for models: "Never attribute to malice that which is adequately explained by a model's stupidity."

In my opinion, we've reached some ceiling where more tokens lead only to incremental improvements. A conspiracy seems unlikely given all providers are still competing for customers and a 50% token drives infra costs up dramatically too.

willis936•2m ago

Never attribute to incompetence what is sufficiently explained by greed.

tailscaler2026•39m ago

Subsidies don't last forever.

pitched•18m ago

Running an open like Kimi constantly for an entire month will cost around 100-200$, being roughly equal to a pro-tier subscription. This is not my estimate so I’m more than open to hearing refutations. Kimi isn’t at all Opus-level intelligent but the models are roughly evenly sized from the guesses I’ve seen. So I don’t think it’s the infra being subsidized as much as it’s the training.

nothinkjustai•4m ago

Kimi costs 0.3/$1.72 on OpenRouter, $200 for that gives you way more than you would get out of a $200 Claude subscription. There are also various subscription plans you can use to spend even less.

gadflyinyoureye•15m ago

I've been assuming this for a while. If I have a complex feature, I use Opus 4.6 in copilot to plan (3 units of my monthly limit). Then have Grok or Gemini (.25-.33) of my monthly units to implement and verify the work. 80% of the time it works every time. Leave me plenty of usage over the month.

andai•3m ago

Yeah I've been arriving at the same thing. The other models give me way more usage but they don't seem to have enough common sense to be worth using as the main driver.

If I can have Claude write up the plan, and the other models actually execute it, I'd get the best of both worlds.

(Amusingly, I think Codex tolerates being invoked by Claude (de facto tolerated ToS violation), but not the other way around.)

smt88•11m ago

Tell that to oil and defense companies.

If tech companies convince Congress that AI is an existential issue (in defense or even just productivity), then these companies will get subsidies forever.

andai•6m ago

Yeah, USA winning on AI is a national security issue. The bubble is unpoppable.

And shafting your customers too hard is bad for business, so I expect only moderate shafting. (Kind of surprised at what I've been seeing lately.)

matt3210•35m ago

Did anyone expect the price to go down? The point of new models is to raise prices

ant6n•33m ago

I thought it would be to get better, to stay competitive with the competitors and free models.

operatingthetan•28m ago

The long-term pitch of these AI companies is that the AI will essentially replace workers for low cost.

If the models don't get to a higher level of 'intelligence' and still struggle with certain basic tasks at the SOTA while also getting more expensive, then then the pitch is misleading and unlikely to happen.

Shailendra_S•31m ago

45% is brutal if you're building on top of these models as a bootstrapped founder. The unit economics just don't work anymore at that price point for most indie products.

What I've been doing is running a dual-model setup — use the cheaper/faster model for the heavy lifting where quality variance doesn't matter much, and only route to the expensive one when the output is customer-facing and quality is non-negotiable. Cuts costs significantly without the user noticing any difference.

The real risk is that pricing like this pushes smaller builders toward open models or Chinese labs like Qwen, which I suspect isn't what Anthropic wants long term.

c0balt•27m ago

One could reconsider whether building your business on top of a model without owning the core skills to make your product is viable regardless.

A smaller builder might reconsider (re)acquiring relevant skills and applying them. We don't suddenly lose the ability to program (or hire someone to do it) just because an inference provider is available.

OptionOfT•26m ago

That's the risk you take on.

There are 2 things to consider:

    * Time to market.
    * Building a house on someone else's land.

You're balancing the 2, hoping that you win the time to market, making the second point obsolete from a cost perspective, or you have money to pivot to DIY.

duped•19m ago

> if you're building on top of these models as a bootstrapped founder

This is going to be blunt, but this business model is fundamentally unsustainable and "founders" don't get to complain their prospecting costs went up. These businesses are setting themselves up to get Sherlocked.

The only realistic exit for these kinds of businesses is to score a couple gold nuggets, sell them to the highest bidder, and leave.

dakiol•28m ago

We dropped Claude. It's pretty clear this is a race to the bottom, and we don't want a hard dependency on another multi-billion dollar company just to write software

We'll be keeping an eye on open models (of which we already make good use of). I think that's the way forward. Actually it would be great if everybody would put more focus on open models, perhaps we can come up with something like the "linux/postgres/git/http/etc" of the LLMs: something we all can benefit from while it not being monopolized by a single billionarie company. Wouldn't it be nice if we don't need to pay for tokens? Paying for infra (servers, electricity) is already expensive enough

ben8bit•27m ago

Any recommendations on good open ones? What are you using primarily?

blahblaher•23m ago

qwen3.5/3.6 (30B) works well,locally, with opencode

zozbot234•14m ago

Mind you, a 30B model (3B active) is not going to be comparable to Opus. There are open models that are near-SOTA but they are ~750B-1T total params. That's going to require substantial infrastructure if you want to use them agentically, scaled up even further if you expect quick real-time response for at least some fraction of that work. (Your only hope of getting reasonable utilization out of local hardware in single-user or few-users scenarios is to always have something useful cranking in the background during downtime.)

pitched•10m ago

I want to bump this more than just a +1 by recommending everyone try out OpenCode. It can still run on a Codex subscription so you aren’t in fully unfamiliar territory but unlocks a lot of options.

zozbot234•6m ago

The Codex TUI harness is also open source and you can use open models with it, so you can stay in even more familiar territory.

cpursley•9m ago

How are you running it with opencode, any tips/pointers on the setup?

jherdman•6m ago

Is this sort of setup tenable on a consumer MBP or similar?

pitched•1m ago

For a 30B model, you want at least 20GB of VRAM and a 24GB MBP can’t quite allocate that much of it to VRAM. So you’d want at least a 32GB MBP.

cmrdporcupine•12m ago

GLM 5.1 via an infra provider. Running a competent coding capable model yourself isn't viable unless your standards are quite low.

ahartmetz•24m ago

>we don't want a hard dependency on another multi-billion dollar company just to write software

One of two main reasons why I'm wary of LLMs. The other is fear of skill atrophy. These two problems compound. Skill atrophy is less bad if the replacement for the previous skill does not depend on a potentially less-than-friendly party.

tossandthrow•17m ago

You can argu that you will have skill atrophy by not using LLMs.

We have gone multi cloud disaster recovery on our infrastructure. Something I would not have done yet, had we not had LLMs.

I am learning at an incredible rate with LLMs.

jjallen•14m ago

Also AI could help you pick those skills up again faster, although you wouldn’t need to ever pick those skills up again unless AI ceased to exist.

What an interesting paradox-like situation.

deadbabe•13m ago

Using LLMs as a learning tool isn’t what causes skill atrophy. It’s using them to solve entire problems without understanding what they’ve done.

And not even just understanding, but verifying that they’ve implemented the optimal solution.

mgambati•12m ago

I kind feel the same. I’m learning things and doing things in areas that would just skip due to lack of time or fear.

But I’m so much more detached of the code, I don’t feel that ‘deep neural connection’ from actual spending days in locked in a refactor or debugging a really complex issue.

I don’t know how a feel about it.

afzalive•7m ago

As someone who's switched from mobile to web dev professionally for the last 6 months now. If you care about code quality, you'll develop that neural connection after some time.

But if you don't and there's no PR process (side projects), the motivation to form that connection is quite low.

Fire-Dragon-DoL•6m ago

I strongly agree on the refactor, but for debugging I have another perspective: I think debugging is changing for the better, so it looks different.

Sure, you don't know the code by heart, but people debugging code translated to assembly already do that.

The big difference is being able to unleash scripts that invalidate enormous amount of hypothesis very fast and that can analyze the data.

Used to do that by hand it took hours, so it would be a last resort approach. Now that's very cheap, so validating many hypothesis is way cheaper!

I feel like my "debugging ability" in terms of value delivered has gone way up. For skill, it's changing. I cannot tell, but the value i am delivering for debugging sessions has gone way up

bluefirebrand•12m ago

> I am learning at an incredible rate with LLMs

Could you do it again without the help of an LLM?

If no, then can you really claim to have learned anything?

tossandthrow•7m ago

I could definitely maintain the infrastructure without an llm. Albeit much slower.

And yes. If LLMs disappear, then we need to hire a lot of people to maintain the infrastructure.

Which naturally is a part of the risk modeling.

danw1979•1m ago

I think this is a bit dismissive.

It’s quite possible to be deep into solving a problem with an LLM guiding you where you’re reading and learning from what it says. This is not really that different from googling random blogs and learning from Stack Overflow.

Assuming everyone just sits there dribbling whilst Claude is in YOLO mode isn’t always correct.

ori_b•7m ago

Yes, you certainly can argue that, but you'd be wrong. The entire selling point of LLMs is that they solve the problem of needing skill.

tossandthrow•5m ago

That is not the entire selling point - so you are very wrong.

You very much decide how you employ LLMs.

Nobody are keeping a gun to your head to use them. In a certain way.

Sonif you use them in a way that increase you inherent risk, then you are incredibly wrong.

i_love_retros•7m ago

>I am learning at an incredible rate with LLMs.

I don't believe it. Having something else do the work for you is not learning, no matter how much you tell yourself it is.

tossandthrow•4m ago

It is easy to not believe if you only apply an incredibly narrow world view.

Open your eyes, and you might become a believer.

ljm•1m ago

Not so much atrophy as apathy.

I've worked with people who will look at code they don't understand, say "llm says this", and express zero intention of learning something.

It's like, why even review that PR in the first place if you don't even know what you're working with?

tossandthrow•18m ago

The lock in is so incredibly poor. I could switch to whatever provider in minuets.

But it requires that one does not do something stupid.

Eg. For recurring tasks: keep the task specification in the source code and just ask Claude to execute it.

The same with all documentation, etc.

i_love_retros•8m ago

> we don't want a hard dependency on another multi-billion dollar company just to write software

My manager doesn't even want us to use copilot locally. Now we are supposed to only use the GitHub copilot cloud agent. One shot from prompt to PR. With people like that selling vendor lock in for them these companies like GitHub, OpenAI, Anthropic etc don't even need sales and marketing departments!

tossandthrow•2m ago

You are aware that using eg. Github copilot is not one shot? It will start an agentic loop.

aliljet•1m ago

What open models are truly competing with both Claude Code and Opus 4.7 (xhigh) at this stage?

ben8bit•28m ago

Makes me think the model could actually not even be smarter necessarily, just more token dependent.

hirako2000•11m ago

Asking a seller to sell less.

That's an incentive difficult to reconcile with the user's benefit.

To keep this business running they do need to invest to make the best model, period.

It happens to be exactly what Anthropic's strategy is. That and great tooling.

l5870uoo9y•27m ago

My impression the reverse is true when upgrading to GPT-5.4 from GPT-5; it uses fewer tokens(?).

micromacrofoot•27m ago

The latest qwen actually performs a little better for some tasks, in my experience

latest claude still fails the car wash test

tiffanyh•26m ago

I was using Opus 4.7 just yesterday to help implement best practices on a single page website.

After just ~4 prompts I blew past my daily limit. Another ~7 more promoted I blew past my weekly limit.

The entire HTMl/CSS/JS was less than 300 lines of code.

I was shocked how fast it exhausted my usage limits.

hirako2000•15m ago

I haven't used Claude. Because I suspect this sort of things to come.

With enterprise subscription, the bill gets bigger but it's not like VP can easily send a memo to all its staff that a migration is coming.

Individuals may end their subscription, that would appease the DC usage, and turn profits up.

sync•12m ago

Which plan are you on? I could see that happening with Pro (which I think defaults to Sonnet?), would be surprised with Max…

templar_snow•10m ago

It eats even the Max plan like crazy.

tomtomistaken•11m ago

Are you using Claude subscription? Because that's not how it works there.

mvkel•23m ago

The cope is real with this model. Needing an instruction manual to learn how to prompt it "properly" is a glaring regression.

The whole magic of (pre-nerfed) 4.6 was how it magically seemed to understand what I wanted, regardless of how perfectly I articulated it.

Now, Anth says that needing to explicitly define instructions are as a "feature"?!

blahblaher•21m ago

Conspiracy time: they released a new version just so hey could increase the price so that people wouldn't complain so much along the lines of "see this is a new version model, so we NEED to increase the price") similar to how SaaS companies tack on some shit to the product so that they can increase prices

willis936•3m ago

The result is the same: they lose their brand of producing quality output. However the more clever the maneuver they try to pull off the more clear it is to their customers that they are not earning trust. That's what will matter at the end of this. Poor leadership at Claude.

templar_snow•20m ago

Brutal. I've been noticing that 4.7 eats my Max Subscription like crazy even when I do my best to juggle tasks (or tell 4.7 to use subagents with) Sonnet 4.6 Medium and Haiku. Would love to know if anybody's found ideal token-saving approaches.

monkeydust•19m ago

'sixxxx, seeeeven'....sorry have little kids, couldn't resist but perhaps that explains what's going on!

dackdel•19m ago

releases 4.8 and deletes everything else. and now 4.8 costs 500% more than 4.7. i wonder what it would take for people to start using kimi or qwen or other such.

rectang•18m ago

For now, I'm planning to stick with Opus 4.5 as a driver in VSCode Copilot.

My workflow is to give the agent pretty fine-grained instructions, and I'm always fighting agents that insist on doing too much. Opus 4.5 is the best out of all agents I've tried at following the guidance to do only-what-is-needed-and-no-more.

Opus 4.6 takes longer, overthinks things and changes too much; the high-powered GPTs are similarly flawed. Other models such as Sonnet aren't nearly as good at discerning my intentions from less-than-perfectly-crafted prompts as Opus.

Eventually, I quit experimenting and just started using Opus 4.5 exclusively knowing this would all be different in a few months anyway. Opus cost more, but the value was there.

But now I see that 4.7 is going to replace both 4.5 and 4.6 in VSCode Copilot, and with a 7.5x modifier. Based on the description, this is going to be a price hike for slower performance — and if the 4.5 to 4.6 change is any guide, more overthinking targeted at long-running tasks, rather than fine-grained. For me, that seems like a step backwards.

hgoel•14m ago

The bump from 4.6 to 4.7 is not very noticeable to me in improved capabilities so far, but the faster consumption of limits is very noticeable.

I hit my 5 hour limit within 2 hours yesterday, initially I was trying the batched mode for a refactor but cancelled after seeing it take 30% of the limit within 5 minutes. Had to cancel and try a serial approach, consumed less (took ~50 minutes, xhigh effort, ~60% of the remaining allocation IIRC), but still very clearly consumed much faster than with 4.6.

It feels like every exchange takes ~5% of the 5 hour limit now, when it used to be maybe ~1-2%. For reference I'm on the Max 5x plan.

For now I can tolerate it since I still have plenty of headroom in my limits (used ~5% of my weekly, I don't use claude heavily every day so this is OK), but I hope they either offer more clarity on this or improve the situation. The effort setting is still a bit too opaque to really help.

silverwind•14m ago

Still worth it imho for important code, but it shows that they are hitting a ceiling while trying to improve the model which they try to solve by making it more token-inefficient.

razodactyl•12m ago

If anyone's had 4.7 update any documents so far - notice how concise it is at getting straight to the point. It rewrote some of my existing documentation (using Windsurf as the harness), not sure I liked the decrease in verbosity (removed columns and combined / compressed concepts) but it makes sense in respect to the model outputting less to save cost.

To me this seems more that it's trained to be concise by default which I guess can be countered with preference instructions if required.

What's interesting to me is that they're using a new tokeniser. Does it mean they trained a new model from scratch? Used an existing model and further trained it with a swapped out tokeniser?

The looped model research / speculation is also quite interesting - if done right there's significant speed up / resource savings.

andai•10m ago

Interesting. In conversational use, it's noticeably more verbose.

KellyCriterion•7m ago

Yesterday, I killed my weekly limit with just three prompts and went into extra usage for ~18USD on top

axeldunkel•7m ago

the better the tokenizer maps text to its internal representation, the better the understanding of the model what you are saying - or coding! But 4.7 is much more verbose in my experience, and this probably drives cost/limits a lot.

autoconfig•5m ago

My initial experience with Opus 4.7 has been pretty bad and I'm sticking to Codex. But these results are meaningless without comparing outcome. Wether the extra token burn is bad or not depends on whether it improves some quality / task completion metric. Am I missing something?

napolux•1m ago

Token consumption is huge compared to 4.6 even for smaller tasks. Just by "reasoning" after my first prompt this morning I went over 50% over the 5 hours quota.

TSA Bans This Knife at Checkpoint – Then Airlines Hand It to You on Plane

Ask HN: Does magic link authentication use HTML canvansing?

SF is obsessed with the safest drivers and ignoring the ones killing people

Write Code Like You Just Learned How to Program

Salesforce launches Headless 360 turn platform in infrastructure for AI agents

Bluetooth tracker in a postcard and mailed to a warship exposed its location

Trump, When Asked About White House Meeting with Anthropic's Dario Amodei: Who?

Too many tools or tool overload

IMF says America's $39T national debt is a global problem

Shared Dictionaries: compression that keeps up with the agentic web

Show HN: DOMPrompter – click a DOM element, get a structured AI coding prompt

List of people imprisoned for editing Wikipedia

Phantom Camerawork: Star Wars: Episode I – The Phantom Menace (1999)

SpaceX Files FCC Complaint over Ariane 64 Amazon Leo Launch

The AI stack trap: the hidden cost of overbuilding with AI

Welcome to the Machine: The Matrix (1999)

4-bit floating point FP4

Comparing GPT-5.4, Opus 4.6, GLM-5.1, Kimi K2.5, MiMo V2 Pro and MiniMax M2.7

Two $20B: OpenAI and Nvidia in a 'Reasoning Battle'

Graphs That Explain the State of AI in 2026

Poetical: AI Poetry Feed

Unobtrusive inference of diurnal rhythms from smartphone data

Show HN: WebGL Liminal Space

Rail: A self-hosting language that speaks TLS alone

SDF Based Chess Rendering

Does Your Poop Float?

The quest to measure our relationship with nature

An Agentic Home Bioreactor

URLmind – Turn any business URL into structured context for AI agents

Crystal 1.20.0 Is Released