AWS Trainium3 Deep Dive – A Potential Challenger Approaching

https://newsletter.semianalysis.com/p/aws-trainium3-deep-dive-a-potential

71•Symmetry•2mo ago

Comments

klysm•1mo ago

This won't materialize into a legitimate threat on the NVIDIA/TPU landscape without enormous software investment. That's why NVIDIA won in the first place. This requires executives to see past the hardware and make riskier investments and we will see if this actually materializes under AWS management or not.

stogot•1mo ago

This is addressed in the article.

> In fact, they are conducting a massive, multi-phase shift in software strategy. Phase 1 is releasing and open sourcing a new native PyTorch backend. They will also be open sourcing the compiler for their kernel language called “NKI” (Neuron Kernal Interface) and their kernel and communication libraries matmul and ML ops (analogous to NCCL, cuBLAS, cuDNN, Aten Ops). Phase 2 consists of open sourcing their XLA graph compiler and JAX software stack.

> By open sourcing most of their software stack, AWS will help broaden adoption and kick-start an open developer ecosystem. We believe the CUDA Moat isn’t constructed by the Nvidia engineers that built the castle, but by the millions of external developers that dig the moat around that castle by contributing to the CUDA ecosystem. AWS has internalized this and is pursuing the exact same strategy.

coredog64•1mo ago

I wish AWS all the best, but I will say that their developer-facing software doesn't have the best track record. Munger-esque "incentive defines the outcome" and all that, but I don't think they're well positioned to collect actionable insight from open GitHub repos.

almostgotcaught•1mo ago

This isn't an "enormous software investment", this is table stakes which lose out heads up against Nvidia. See AMD.

ivape•1mo ago

In terms of their seriousness, word on the street is they are moving from custom chips they could be getting from Marvell over to some company I've never heard of it. So, they are making decisions that appear serious in this direction:

With Alchip, Amazon is working on "more economical design, foundry and backend support" for its upcoming chip programs, according to Acree.

https://www.morningstar.com/news/marketwatch/20251208112/mar...

mrlongroots•1mo ago

Hyperscalers do not need to achieve parity with Nvidia. There's a (let's say) 50% headroom in terms of profit margins, and plenty of headroom in terms of the complexity custom chip efforts need to implement: they don't need the complexity or generality of Nvidia's chips. If a simple architecture allows them to do inference at 50% of the TCO and 1/5th the complexity and reduce their Nvidia bill by 70% that's a solid win. I'm being fast and loose with numbers and Trainium clearly seems to have ambitions beyond inference, but given the hundreds of billions each cloud vendor is investing in the AI buildout, a couple billion on IP that you will own afterwards is a no brainer. Nvidia has good products and a solid head start but they're not unassailable or anything.

bri3d•1mo ago

IMO the value of COTS software stack compatibility is becoming overstated: academics, small research groups, hobbyists, and some enterprises will rely on commodity software stacks working well out of the box, but large pure/"frontier"-AI inference-and-training companies are already hand optimizing things anyway and a lot of less dedicated enterprise customers are happy to use provided engines (like Bedrock) and operate at only the higher level.

I do think AWS need to improve their software to capture more downmarket traction, but my understanding is that even Trainium2 with virtually no public support was financially successful for Anthropic as well as for scaling AWS Bedrock workloads.

Ease of optimization at the architecture level is what matters at the bleeding edge; a pure-AI organization will have teams of optimization and compiler engineers who will be mining for tricks to optimize the hardware.

trueismywork•1mo ago

And data hosting rules

willahmad•1mo ago

Don't underestimate AWS.

AWS can make it seamless, so you can run open source models on their hardware.

See their ARM based instances, you rarely notice you are running on ARM, when using Lambda, k8s, fargate and others

epolanski•1mo ago

I feel your posts miss the bigger picture: it's a marathon, not a sprint. If you get much lower TCO than by buying Nvidia hardware at their insane margins you get more output at lower cost.

Amazon has all the resources needed to write their own backends to several ML software or even drop-in API replacements.

Eventually economics win: where margins are high competition appears and in time margins get thinner and competition starts disappearing again, it's a cycle.

dpoloncsak•1mo ago

Isnt this exactly what was said about Google and their TPU's before it transitioned from the NVIDIA landscape to the NVIDIA/TPU landscape?

Turns out multi-billion dollar software companies can deal with the enormous software investment

jauntywundrkind•1mo ago

> they will go with three different scale-up switch solutions over the lifecycle of Trainium3, starting with a 160 lane, 20 port PCIe switch for fast time to market due to the limited availability today of high lane & port count PCIe switches, later switching to 320 Lane PCIe switches and ultimately a larger UALink to pivot towards best performance.

It doesn't have a lot of ports and certainly not enough NTB to be useful as a switch, but man, wild to me than an AMD Epyc core has 128 lanes of PCIe and that switch chips are struggling to match even a basic server's worth of net bandwidth.

artur44•1mo ago

The hardware story is interesting, but I’m curious how much of the real-world adoption will depend on the maturity of the compiler stack. Trainium2 already showed that good silicon isn’t enough if the software layer lags behind.

If AWS really delivers on open-sourcing more of the toolchain, that could be a much bigger signal for adoption than raw specs alone.

thecopy•1mo ago

I have seen links to semianalysis before, i just am scared of the length of this content. Is anyone reading these start to finish? Why?

esafak•1mo ago

I think they're for investors.

epolanski•1mo ago

They are.

mlmonkey•1mo ago

Ask Gemini to summarize it? Or maybe NotebookLM to turn it into a 10-minute podcast? :-)

hobo_mark•1mo ago

I don't read them, but I listen to them on my commute (with a saas I made).

ijidak•1mo ago

What is the saas? I've been looking for something like this.

mNovak•1mo ago

I do, just for fun. It's become sort of a hobby, learning more depth/detail behind the current AI arms race. It certainly cuts through the shallow takes that get thrown around constantly.

cmiles8•1mo ago

Chips without an ecosystem and software (CUDA) does not a serious challenger make. Thats where Amazon has, and continues to, struggle.

t1234s•1mo ago

What does this mean for a company like Coreweave?

Analemma_•1mo ago

CoreWeave already had to issue more convertible debt earlier this week after a big dip in their share price. It seems like the market suspects the end is near.

villgax•1mo ago

As evident by recent HN coverage, SemiAnalysis is just becoming another shi*posting publication. Not one person in the industry consider them reliable/technically sound.

The Anthropic Hive Mind

A Horrible Conclusion

I spent $10k to automate my research at OpenAI with Codex

From Zero to Hero: A Spring Boot Deep Dive

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

Cook New Emojis

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger