frontpage.

I was exploring this conceptual architecture for neural networks (transformers/attention) based long-context models, its conceptual but grounded in sound existing research and architecture implementations on specialized hardware like gpu's and tpu's.

Can a we scale up independent shards of (mini) contexts, i.e Sub-global attention blocks or "sub-context experts" that can operate somewhat independently with global composition into a larger global attention as a paradigm for handling extremely long contexts. Context shared, distributed and sharded across chips, that can act as Independent shards of (mini) Contexts.

This could possibly (speculating here) make attention based context sub-quadratic. Its possible (again speculating here) google might have used something like this for having such long context windows.

Evidence points to this: Google's pioneering MoE research (Shazeer, GShard, Switch), advanced TPUs (v4/v5p/Ironwood) with massive HBM & high-bandwidth 3D Torus/OCS Inter-Chip Interconnect (ICI) enabling essential distribution (MoE experts, sequence parallelism like Ring Attention), and TPU pod VRAM capacities aligning with 10M token context needs. Google's Pathways & system optimizations further support possibility of such a distributed, concurrent model.

Share your thoughts on this if its possible, feasible or why it might not work.

'A Billion Streams and No Fans': Inside a $10M AI Music Fraud Case

Red Programming Language

Show HN: I made a zero-commission AI prompt marketplace for selling AI Prompts

KumoRFM: A Foundation Model for In-Context Learning on Relational Data [pdf]

Google Glasses are back, sort of. XREAL and Google announce partnership for XR

Running Claude in a loop to write a novel

Show HN: Node.js Memory Limits Visualized

Reaching Higher: Megan Gleason '18 Climbs to International Competition

Google launches AI Ultra: A $3k/year 'VIP pass' to its most powerful AI tools

From hype to harm: 78% of CISOs see AI attacks already

Millions at risk after attackers steal UK legal aid data dating back 15 years

Delft unveils open-architecture quantum computer, Tuna-5

Show HN: GeniusPlants – AI-Powered Gardening Assistant

IQM to deliver world-leading 300-qubit quantum computer to Finland

LastOS slaps neon paint on Linux Mint and dares you to run Photoshop

The Labor Market for Recent College Graduates

Helblazer811/Diffusion-Explorer: Interactive Visualizations

D-Wave Announces General Availability of Advantage2 Quantum Computer

With AI Mode, Google Search Is About to Get Even Chattier

Gemma 3n preview: powerful, efficient, mobile-first AI

Rbdoom-3-BFG: Doom 3 port using Nvidia's NVRHI

Show HN: I Made Resend.com Cheaper

FDA will limit Covid vaccines to people over 65 or high risk of serious illness

KumoRFM: Gen-purpose model for making instant predictions over relational data

Nvidia Donut: real-time rendering framework

Why 3D doesn't work and never will. Case closed. (2011)

Why are linked lists implemented the way they are in the Linux kernel? (2014)

The evolution of onboard cameras in Formula One

Sockudo: High-Performance Pusher-Compatible WebSockets Built with Rust

The Meritocracy to Eugenics Pipeline

Ask HN: Can sharded contexts scale up to long-context with global composition?