frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Devil Inside GitHub

https://blog.melashri.net/micro/github-devil/
1•elashri•23s ago•0 comments

Show HN: Distill – Migrate LLM agents from expensive to cheap models

https://github.com/ricardomoratomateos/distill
1•ricardomorato•24s ago•0 comments

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

https://github.com/sigmastratum/documentation/tree/main/sigma-runtime/SR-053
1•teugent•45s ago•0 comments

Make a local open-source AI chatbot with access to Fedora documentation

https://fedoramagazine.org/how-to-make-a-local-open-source-ai-chatbot-who-has-access-to-fedora-do...
1•jadedtuna•2m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

https://github.com/ghostty-org/ghostty/pull/10559
1•samtrack2019•2m ago•0 comments

Software Factories and the Agentic Moment

https://factory.strongdm.ai/
1•mellosouls•2m ago•1 comments

The Neuroscience Behind Nutrition for Developers and Founders

https://comuniq.xyz/post?t=797
1•01-_-•2m ago•0 comments

Bang bang he murdered math {the musical } (2024)

https://taylor.town/bang-bang
1•surprisetalk•2m ago•0 comments

A Night Without the Nerds – Claude Opus 4.6, Field-Tested

https://konfuzio.com/en/a-night-without-the-nerds-claude-opus-4-6-in-the-field-test/
1•konfuzio•5m ago•0 comments

Could ionospheric disturbances influence earthquakes?

https://www.kyoto-u.ac.jp/en/research-news/2026-02-06-0
1•geox•6m ago•0 comments

SpaceX's next astronaut launch for NASA is officially on for Feb. 11 as FAA clea

https://www.space.com/space-exploration/launches-spacecraft/spacexs-next-astronaut-launch-for-nas...
1•bookmtn•8m ago•0 comments

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
1•fainir•10m ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•11m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•13m ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
2•Brajeshwar•17m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
3•Brajeshwar•18m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
2•Brajeshwar•18m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•21m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•24m ago•1 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•25m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•25m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
3•vinhnx•26m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•31m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•35m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•39m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•41m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•42m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
5•okaywriting•48m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•51m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•52m ago•0 comments
Open in hackernews

Why stop at 1M tokens when you can have 10M?

2•Zen_Sherbert•3mo ago
To start us off, I'm going to make a ridiculous claim.

On my 7800XT gaming GPU, using less than 3GB of VRAM for the buffer, I have built an architecture that can process a 10 million token context.

This is not a joke. You can run it in a Google Colab notebook, on a free T4, and prove it to yourself right now:

The Proteus Playground https://colab.research.google.com/github/Zen-Sherbert/Proteus-Attention/blob/main/TinyPlayground.ipynb

It runs flawlessly on both CUDA and ROCm. It works. With the proof-of-concept out of the way, here are the three core ideas that got me here.

1. DNA - Tokens have value.

My journey started with a simple idea: tokens mean something. They have value. So why don't we use it?

I built a system called DNA, where each attention "gate" learns a "taste" for certain tokens and pulls them in like gravity. The crazy part? On a raw, untrained model, I found that 334 out of 500 tokens were already being caught by this system. It's a natural, emergent behavior.

2. The Alpha Slider - "Why can't I just change my model?"

I hated that I couldn't just switch my model from dense, to sparse, to linear whenever I wanted. So, I built a custom Triton kernel to do exactly that.

The result is a single knob called alpha:

Dense, high-fidelity? alpha = 0.0.

Balanced sub-quadratic? alpha = 0.3.

Screaming-fast linear time? alpha = 1.0 and the attention mechanic goes brrrrrr.

3. Chunking & RoPE - "So I got rid of it."

My new systems got me far, but the VRAM bottleneck was still a headache. So I got rid of it.

The idea is simple: chunking. Break a massive context into small pieces, shunt them to system RAM, and use a tiny VRAM buffer for only the most important tokens.

DNA tells us what's important. As a Hail Mary, I added RoPE to preserve where it came from. This combination creates contextual teleportation. It allows the model to create a perfect "highlight reel" and reason over it as if critical facts, separated by thousands of pages, were sitting right next to each other. It's your own little wormhole across data space.

TL;DR: I built an extreme context system that costs less than Minecraft to run. Would love feedback, as I'm still exploring how far it can go.

Github: https://github.com/Zen-Sherbert/Proteus-Attention/tree/main

Comments

Zen_Sherbert•3mo ago
A little bit about the origin story for those who are interested:

This whole thing started with me trying to implement sparsity, and getting it totally wrong. The DNA idea came to me in the middle of the night during my shift as an asset protection officer. The rest of it was just fumbling from one discovery to the next, mostly ignoring the "right" way to do things.

I'm an 8-year veteran, a father of three, and I just finished my bachelor's. I am not an AI researcher. If I can build this, you can do something infinitely better.

Please, try the Colab. Break it. Play with it. I implore you to tell me how it breaks. I'm excited to see what the community thinks.

gus_massa•3mo ago
Clicky: https://colab.research.google.com/github/Zen-Sherbert/Proteu... https://github.com/Zen-Sherbert/Proteus-Attention/tree/main

> The idea is simple: chunking. Break a massive context into small pieces, shunt them to system RAM, and use a tiny VRAM buffer for only the most important tokens.

So, ... you are cherry picking some tokens to be added to the context?

Zen_Sherbert•3mo ago
In essence that's exactly the idea.

It's not what you think it is though. It's choose the right words in the right places under the right context.

You submit a 5 million token document of mixed data. It's a jumble of finances, cooking, and old stereo instructions for some reason.

You ask a the model what ingredients are in a chicken caprese.

It won't have to read millions of tokens, it will understand the what and the where and the why.

So chunking specifically isn't about understanding an entire context window of 5 million.

It's more about working with it in small pieces in relation to inference.

It is not a replacement, rather an alternative. An early one at that.

Thank you for taking the time to read, I appreciate the input and the skepticism too.

If you have more, can you provide any?

gus_massa•3mo ago
> It's not what you think it is though. It's choose the right words in the right places under the right context.

That's approximately what I thought. I want to be sure. Anyway, the details are very important.

> I appreciate the input and the skepticism too.

Let's say 20% skepticism and 80% it sounds like a good idea. I'm not using AI models too much, so it's hard for me to evaluate it. Let's hope someone else can give requested and unrequested feedback.