Bests
Cerebras WSE-3 is a brilliant packaging play: one wafer-scale chip (~46,000 mm², ~900k cores) with ~44 GB of SRAM spread across the plane, so compute and memory sit side by side with enormous bandwidth. The catch is density — SRAM is a 6T cell, so even a whole wafer only holds ~44 GB. An 80B model doesn't fit on-wafer, so weights stream in from external MemoryX (off-wafer DRAM). It's fast, but it's a ~23 kW, multi-million-dollar system, and large models are still memory-streamed.
Sophon is a single ~750 mm² die. Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic — 64 monolithic 3D tiers of 2D-TMD compute-in-memory and capacitor-less gain-cell DRAM. The gain cell is denser than SRAM per layer, and we stack 32 memory tiers of it, so we get 330 GB on one normal-size die — enough that an 80B model is fully resident, no streaming, no off-chip memory at all. ~1 kW, not 23 kW.
So the real difference is SRAM-in-2D vs DRAM-in-3D: Cerebras maxes out planar SRAM area; we trade to denser DRAM and stack it vertically, which is what buys GB-scale on-die capacity.
Honest caveat: Cerebras ships real silicon today and is genuinely fast — they proved wafer-scale integration works. We're pre-silicon, betting on a harder materials path (2D-TMD monolithic 3D). The upside, if it yields, is capacity-per-watt and per-dollar that planar SRAM can't reach.
Have you done any hardware tests of this plan? Is this still considered quality advice?
Second q, why start with 28nm? Is the idea that you want to stick with TSMC and be able to shrink? If this does in fact work well, I can imagine wanting to shoot for a smaller process node pretty quickly. Is there some sort of tech / design gap you'll need to figure out as you go?
We fabricated 2T0C DRAM arrays with a 3D monolithic structure. That's a must-do.
Why 28nm? Because it's cheap, widely available, and already gives us enough performance to beat Nvidia Vera Rubin. We have a road map, scaling it down. https://www.phantafield.com/whitepaper#6-scaling-roadmap
Edit: I can see a bunch of hints, most definitely. Still a good comment though.
This actually helps a lot, thanks.
> Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic
Is this done with current manufacturing technologies? Does it require a special process?
> no streaming, no off-chip memory at all. ~1 kW, not 23 kW
Is this for an individual compute unit? Compared to Cerebras, what's the ratio of power used vs compute output?
codingpanic•1h ago
minkowsky•1h ago
First, separate three things people lump together. Apple already does memory on package (M-series unified memory = LPDDR5X dies next to the SoC). The near-term industry path is bonded stacking (AMD 3D V-cache, HBM4's logic base die). What we're doing is monolithic — growing the memory on top of finished logic. Three reasons that distinction matters:
1. Bonding only helps at the margin. A hybrid-bond interface still carries a relatively large interconnect capacitance in um scale, so at memory bandwidth the I/O drivers crossing it dissipate most of the power and overheat — you move the memory closer without escaping the I/O energy. Monolithic inter-tier vias are nano-scale (we model ~1% the interconnect energy of a bonded interface), and that's the only thing that actually moves the needle.
2. 2D-TMDs are the only functional CMOS you can build in the BEOL. Monolithic 3D means fabricating the upper tiers after the logic, at ≤450 °C, or you cook everything underneath. Silicon needs ~1000 °C; low-temp oxide semiconductors (IGZO) are n-type only, so no real CMOS. 2D-TMDs give both n- and p-type at BEOL temperature. Nothing else does.
3. ~6 orders of magnitude lower off-current (~1 fA/µm) finally makes a capacitor-free cell work. Conventional 1T1C DRAM needs a big storage capacitor — the deep-trench / high-aspect-ratio etch you can't do in the BEOL anyway. A 2T0C gain cell holds charge on a transistor gate with no capacitor; in silicon it leaked away in microseconds, so it was never usable. With 2D-TMD leakage you get ~1.8 s retention — refresh at ~1 Hz and drop the capacitor, and the trench, entirely.
Rohansi•1h ago