Ask HN: Is It Possible?, Gemini's Long Context Moe Architecture (Hypothesized)

2•deazy•8mo ago

Gemini's Long Context MoE Architecture (Hypothesized):

Sharing how I think (hypothesis) Gemini models achieve their 1-10 Million long context window. With details to clues to support the same.

Ensemble of Expert (EoE) or Mesh of Expert (MeoE) with common/shared long (1-10M) context window

Gemini's 1M+ token MoE likely uses "instances" (active expert sets/TPU shards) sharing a common distributed context; individual active expert groups then use relevant "parts" of this vast context for generation. This allows concurrent, independent requests via distinct system "partitions."

The context is sharded and managed across numerous interconnected TPUs within a pod.

For any given input, only a sparse set of specialized "expert" subnetworks (a "dynamic pathway") within the total model are activated, based on complexity and context required.

The overall MoE model can handle multiple, concurrent user requests simultaneously.

Each request, with its specific input and context, will trigger its own distinct and isolated pathway of active experts.

Shared context that can act as independent shards of (mini) contexts.

The massively distributed Mixture of Experts (MoE) architecture, across TPUs in a single pod, have its the long context sharded and managed via parallelism, and with ability to handle concurrent requests by part of that context window and independent expert pathways across a large TPU pod, also it can use the entire context window for a single request if required.

Evidence points to this: Google's pioneering MoE research (Shazeer, GShard, Switch), advanced TPUs (v4/v5p/Ironwood) with massive HBM & high-bandwidth 3D Torus/OCS Inter-Chip Interconnect (ICI) enabling essential distribution (MoE experts, sequence parallelism like Ring Attention), and TPU pod VRAM capacities aligning with 10M token context needs. Google's Pathways & system optimizations further support this distributed, concurrent model.

og x thread: https://x.com/ditpoo/status/1923966380854157434

Comments

deazy•8mo ago

Basically this is what I think it is,

Shared context that can act as independent shards of (mini) contexts, i.e Sub-global attention blocks or "sub-context experts" that can operate somewhat independently and then scale up or compose into a larger global attention as a paradigm for handling extremely long contexts.

Trying to see if this can be tested in some way at small scale, its worth a try if it can work, but requires some engineering to make it possible.

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

E-Commerce vs. Social Commerce

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux