Why stop at 1M tokens when you can have 10M?

2•Zen_Sherbert•4h ago

To start us off, I'm going to make a ridiculous claim.

On my 7800XT gaming GPU, using less than 3GB of VRAM for the buffer, I have built an architecture that can process a 10 million token context.

This is not a joke. You can run it in a Google Colab notebook, on a free T4, and prove it to yourself right now:

The Proteus Playground https://colab.research.google.com/github/Zen-Sherbert/Proteus-Attention/blob/main/TinyPlayground.ipynb

It runs flawlessly on both CUDA and ROCm. It works. With the proof-of-concept out of the way, here are the three core ideas that got me here.

1. DNA - Tokens have value.

My journey started with a simple idea: tokens mean something. They have value. So why don't we use it?

I built a system called DNA, where each attention "gate" learns a "taste" for certain tokens and pulls them in like gravity. The crazy part? On a raw, untrained model, I found that 334 out of 500 tokens were already being caught by this system. It's a natural, emergent behavior.

2. The Alpha Slider - "Why can't I just change my model?"

I hated that I couldn't just switch my model from dense, to sparse, to linear whenever I wanted. So, I built a custom Triton kernel to do exactly that.

The result is a single knob called alpha:

Dense, high-fidelity? alpha = 0.0.

Balanced sub-quadratic? alpha = 0.3.

Screaming-fast linear time? alpha = 1.0 and the attention mechanic goes brrrrrr.

3. Chunking & RoPE - "So I got rid of it."

My new systems got me far, but the VRAM bottleneck was still a headache. So I got rid of it.

The idea is simple: chunking. Break a massive context into small pieces, shunt them to system RAM, and use a tiny VRAM buffer for only the most important tokens.

DNA tells us what's important. As a Hail Mary, I added RoPE to preserve where it came from. This combination creates contextual teleportation. It allows the model to create a perfect "highlight reel" and reason over it as if critical facts, separated by thousands of pages, were sitting right next to each other. It's your own little wormhole across data space.

TL;DR: I built an extreme context system that costs less than Minecraft to run. Would love feedback, as I'm still exploring how far it can go.

Github: https://github.com/Zen-Sherbert/Proteus-Attention/tree/main

Comments

Zen_Sherbert•4h ago

A little bit about the origin story for those who are interested:

This whole thing started with me trying to implement sparsity, and getting it totally wrong. The DNA idea came to me in the middle of the night during my shift as an asset protection officer. The rest of it was just fumbling from one discovery to the next, mostly ignoring the "right" way to do things.

I'm an 8-year veteran, a father of three, and I just finished my bachelor's. I am not an AI researcher. If I can build this, you can do something infinitely better.

Please, try the Colab. Break it. Play with it. I implore you to tell me how it breaks. I'm excited to see what the community thinks.

Astronomers may have found the first stars that formed after the Big Bang

New Book on Threat Modelling CAVs

Dare Market Hands Out Crypto If You Complete Potentially Humiliating Pranks

Show HN: AI Test Reviewer for PRs – Finds gaps in your existing tests

Show HN: I analyzed 44 OSS dev tools revenue model matters more than stars

New Infrastructure-as-Code Tool "Formae" Takes Aim at Terraform

LLM's Report Subjective Experience Under Self-Referential Processing

The Laws of Externalized Authorization (2024)

Server DRAM prices surge 50% as AI-induced memory shortage hits hyperscalers

Dermatology's Disastrous War Against the Sun

Why A16Z's new Ayn Rand-style brand sums up the firm perfectly

A professional-grade dependency injection container for TypeScript

Coding with agents is good but I feel so empty

Autark: Rethinking build systems – Integrate, Don't Outsource

In Decade Since Paris Agreement, Climate Outlook Has Improved Dramatically

How to Pick a Career (That Fits You)

At 23: From failing university in Turkey to AI research in Germany

A humble weed became a superstar of biology

Use these 7 Linux commands to keep your system tidy and fast

RPi $5–$10 price increases for some 4GB and 8GB products

AI currently automates 2.5% remote jobs

Show HN: Convosphere – GeoChat app to talk with people nearby anonymously

Show HN: Oodle – Unified Debugging with OpenSearch and Grafana

OpenAI ChatKit Review: Technical Deep Dive and Why We Didn't Adopt It

Prog8

Show HN: I got fired so I built a bank statement converter

Transportation Companies Hacked to Steal Cargo

Researchers demonstrate Agent2Agent prompt injection risk

Chrome FedCM updates: Display iframe domain

A YouTube Education