frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Ask HN: Is It Possible?, Gemini's Long Context Moe Architecture (Hypothesized)

2•deazy•2h ago
Gemini's Long Context MoE Architecture (Hypothesized):

Sharing how I think (hypothesis) Gemini models achieve their 1-10 Million long context window. With details to clues to support the same.

Ensemble of Expert (EoE) or Mesh of Expert (MeoE) with common/shared long (1-10M) context window

Gemini's 1M+ token MoE likely uses "instances" (active expert sets/TPU shards) sharing a common distributed context; individual active expert groups then use relevant "parts" of this vast context for generation. This allows concurrent, independent requests via distinct system "partitions."

The context is sharded and managed across numerous interconnected TPUs within a pod.

For any given input, only a sparse set of specialized "expert" subnetworks (a "dynamic pathway") within the total model are activated, based on complexity and context required.

The overall MoE model can handle multiple, concurrent user requests simultaneously.

Each request, with its specific input and context, will trigger its own distinct and isolated pathway of active experts.

Shared context that can act as independent shards of (mini) contexts.

The massively distributed Mixture of Experts (MoE) architecture, across TPUs in a single pod, have its the long context sharded and managed via parallelism, and with ability to handle concurrent requests by part of that context window and independent expert pathways across a large TPU pod, also it can use the entire context window for a single request if required.

Evidence points to this: Google's pioneering MoE research (Shazeer, GShard, Switch), advanced TPUs (v4/v5p/Ironwood) with massive HBM & high-bandwidth 3D Torus/OCS Inter-Chip Interconnect (ICI) enabling essential distribution (MoE experts, sequence parallelism like Ring Attention), and TPU pod VRAM capacities aligning with 10M token context needs. Google's Pathways & system optimizations further support this distributed, concurrent model.

og x thread: https://x.com/ditpoo/status/1923966380854157434

Comments

deazy•2h ago
Basically this is what I think it is,

Shared context that can act as independent shards of (mini) contexts, i.e Sub-global attention blocks or "sub-context experts" that can operate somewhat independently and then scale up or compose into a larger global attention as a paradigm for handling extremely long contexts.

Trying to see if this can be tested in some way at small scale, its worth a try if it can work, but requires some engineering to make it possible.

Ask HN: What are your most useful custom LLM prompts?

1•quibono•1m ago•0 comments

Particles carrying multiple vaccine doses could reduce follow-up shots

https://news.mit.edu/2025/particles-carrying-multiple-vaccine-doses-could-reduce-need-follow-up-shots-0515
2•gmays•2m ago•0 comments

What Is This Thing Called Swing?

https://www.ds.mpg.de/swing
1•Tomte•3m ago•0 comments

Empowering multi-agent apps with the open Agent2Agent (A2A) protocol

https://www.microsoft.com/en-us/microsoft-cloud/blog/2025/05/07/empowering-multi-agent-apps-with-the-open-agent2agent-a2a-protocol/
1•owebmaster•3m ago•0 comments

A forward and reverse proxy primer for the layman

https://spapas.github.io/2021/09/21/layman-proxy-primer/
1•spapas82•6m ago•1 comments

Flight Simulator Gave Birth to 3D Video-Game Graphics

https://spectrum.ieee.org/microsoft-flight-simulator
1•Tomte•9m ago•0 comments

How the graphical user interface was invented (1989)

https://spectrum.ieee.org/graphical-user-interface
1•andsoitis•11m ago•0 comments

Tesla Cybertruck Trade-Ins

https://www.torquenews.com/11826/tesla-starts-accepting-cybertruck-trade-ins-according-tesla-cybertruck-loses-35000-over-6000
2•jrflowers•11m ago•0 comments

The Legend of Zyntraxis

1•zyntraxis•12m ago•0 comments

How to choose a STT provider for your voice agent?

https://comparevoiceai.com/blog/how-to-choose-stt-voice-ai-model
1•whoami_nr•14m ago•0 comments

Do these Buddhist gods hint at the purpose of China's super-secret satellites?

https://arstechnica.com/space/2025/05/do-these-buddhist-gods-hint-at-the-purpose-of-chinas-super-secret-satellites/
2•rbanffy•17m ago•0 comments

Safety-Critical Rust Coding Guidelines

https://github.com/rustfoundation/safety-critical-rust-coding-guidelines
2•weinzierl•18m ago•0 comments

RFK Jr's plan to ban fluoride supplements will "hurt rural America,"

https://arstechnica.com/health/2025/05/rfk-jr-wants-to-ban-fluoride-supplements-based-on-nonsense/
2•rbanffy•19m ago•0 comments

K-Scale Labs: Open-source humanoid robots, built for developers

https://www.kscale.dev/
1•rbanffy•20m ago•0 comments

Solution of the biggest Computational Problem P vs. NP

https://www.researchgate.net/publication/391442238_A_Constructive_Proof_that_P_NP_via_Circuit-Resistant_Hash_Encodings_and_Local_Certification
1•vicentesteve•21m ago•0 comments

AI Won't Kill Junior Devs – But Your Hiring Strategy Might

https://addyo.substack.com/p/ai-wont-kill-junior-devs-but-your
2•kiyanwang•22m ago•0 comments

Lightweight plastic mirrors drop cost of solar thermal energy by 40%

https://newatlas.com/energy/plastic-mirrors-solar-thermal-energy-cost-unisa/
2•geox•25m ago•0 comments

The Core War Nano Challenge Tournament

http://inversed.ru/CoreWar_Challenge_2.htm
2•impomatic•26m ago•1 comments

VocalTractLab Towards high-quality articulatory speech synthesis

https://vocaltractlab.de/
1•rolph•27m ago•0 comments

Ask HN: How to Fight Internet Addiction?

3•lekker-kapsalon•30m ago•10 comments

Votrax

https://en.wikipedia.org/wiki/Votrax
1•rolph•32m ago•0 comments

China launches first of 2,800 satellites for AI space computing constellation

https://spacenews.com/china-launches-first-of-2800-satellites-for-ai-space-computing-constellation/
2•sxp•32m ago•1 comments

OpenAI's Strategy for ChatGPT (2024)

https://twitter.com/techemails/status/1923799934492606921
1•mfiguiere•34m ago•0 comments

Madrid Río

https://en.wikipedia.org/wiki/Madrid_R%C3%ADo
1•toomuchtodo•34m ago•0 comments

SiYuan, Privacy First, Affero GPL, self-hosted PKMS written in ts and go

https://b3log.org/siyuan/en/
1•OneDeuxTriSeiGo•38m ago•1 comments

The Long Journey to `cgit`

https://oriole.systems/posts/the-long-journey-to-cgit
3•varun_ch•44m ago•0 comments

Show HN: Stack Error – ergonomic error handling for Rust

https://github.com/gmcgoldr/stackerror
2•garrinm•50m ago•0 comments

98k/1 Day Firebase Bill

https://old.reddit.com/r/googlecloud/comments/1klwz0v/98k1_day_firebase_bill_open_letter_to_google/
3•taubek•53m ago•1 comments

GPU Glossary: Open-source GPU documentation for humans

https://github.com/modal-labs/gpu-glossary
2•birdculture•54m ago•0 comments

The Fall of Roam

https://every.to/superorganizers/the-fall-of-roam
10•ingve•1h ago•0 comments