frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
1•gmays•1m ago•0 comments

The Sling: Humanity's Forgotten Power

https://www.slinging.org/
1•jsattler•7m ago•1 comments

Noam Chomsky, Jeffrey Epstein and the Politics of Betrayal

https://chrishedges.substack.com/p/noam-chomsky-jeffrey-epstein-and
2•chmaynard•8m ago•0 comments

Why securing AI model weights isn't enough

https://www.the-substrate.net/p/why-securing-ai-model-weights-isnt
1•erwald•8m ago•0 comments

Cursor Composer 1.5

https://cursor.com/blog/composer-1-5
4•leerob•9m ago•1 comments

Stop Telling Users Their DNS Is Wrong

https://jacob.gold/posts/stop-telling-users-their-dns-is-wrong/
2•jacobgold•9m ago•0 comments

Show HN: Voice Legacy: AI that interviews your parents before it's too late

https://www.voicelegacy.app
1•blacksausage•10m ago•1 comments

Have the patents for H.264 MPEG-4 AVC expired yet?

https://meta.wikimedia.org/wiki/Have_the_patents_for_H.264_MPEG-4_AVC_expired_yet%3F
1•Velocifyer•12m ago•0 comments

The Great Displacement: AI and the Next Fifty Years

https://drive.google.com/file/d/1nm0wOCJJvGS0WI737XHJ7VVDkjQrbgXf/view
1•sideway•12m ago•0 comments

GitButler CLI Is Good

https://matduggan.com/gitbutler-cli-is-really-good/
1•weaksauce•13m ago•0 comments

Modern Keystroke Visualizer for Linux

https://github.com/linuxmobile/keystroke
1•lwhsiao•13m ago•0 comments

AskHN: Is Auth0 Down Again?

2•jansan•13m ago•0 comments

Strengthening Windows trust and security through User Transparency and Consent

https://blogs.windows.com/windowsexperience/2026/02/09/strengthening-windows-trust-and-security-t...
1•pentagrama•16m ago•0 comments

Paragraphic – Parametric graphic design app made in Godot

https://paragraphic.design/
1•count_zero•17m ago•0 comments

Ask HN: I experienced an Attack on Telegram and simcards gone!!!

2•khoobid_shoma•19m ago•0 comments

Ask HN: Any good open source projects written by AI agents?

2•grillorafael•19m ago•0 comments

European Processor Initiative

https://www.european-processor-initiative.eu/project/epi/
2•Gravityloss•20m ago•0 comments

Show HN: Linkpreview.io – Debug and preview social share cards

https://linkpreview.io
1•ravikmd•21m ago•0 comments

The power of anime: using anime for education and outreach in STEM

https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2025.1707055/full
2•PaulHoule•23m ago•0 comments

German patent classified as state secret

https://zenodo.org/records/18551454
2•nAOpx•23m ago•1 comments

Show HN: MumbleFlow – $5 local voice-to-text (whisper.cpp, Rust, no cloud)

https://mumble.helix-co.com
1•mumbleflow•23m ago•0 comments

Hims cancels plans to sell compounded GLP-1 pill after FDA backlash

https://www.biopharmadive.com/news/hims-stop-launch-compounded-wegovy-fda-novo/811662/
2•randycupertino•23m ago•2 comments

Regime-Declared Mathematics as Survivor Sets

https://zboralski.github.io/br/maths/index.html
1•kugutsumen•24m ago•0 comments

National data, local stories: ICE detention in 2026

https://exclav.es/2026/02/08/exploring-ice-detention-facilities/
2•ai_critic•24m ago•1 comments

Is AI the Paperclip?

https://www.newcartographies.com/p/is-ai-the-paperclip
3•headalgorithm•27m ago•0 comments

AGI/Singularity: 9,300 Predictions Analyzed

https://research.aimultiple.com/artificial-general-intelligence-singularity-timing/
1•hakkikonu•29m ago•0 comments

America Has a Tungsten Problem

https://www.noleary.com/blog/posts/1
19•noleary•32m ago•3 comments

Show HN: A last-minute romantic gift app with private links

https://fiveminutelove.com/
1•shivamjjha•34m ago•0 comments

Show HN: I killed my Calendly link after people booking randomly

3•Mrakermo•35m ago•1 comments

Show HN: Claude Cowork for Startup Market Analysis

https://brainwave.vc/prompt
1•louison11•39m ago•0 comments
Open in hackernews

DirectStorage LLM Weight Streaming: 4x faster loading, MoE expert streaming

https://github.com/kibbyd/llm_upper/blob/main/PROJECT_RECORD.md
1•kibbyd1985•1h ago

Comments

kibbyd1985•1h ago
Author here. This project started with a simple question: can you run a 70B MoE model on 8GB VRAM by streaming weights from NVMe SSD to GPU using DirectStorage?

The short answer: the streaming works, but public MoE models don't cooperate.

The long version:

*What works well:* DirectStorage uses DMA to transfer weights from NVMe SSD to GPU via D3D12 staging buffers, skipping the OS page cache that standard I/O relies on. I built a C++ DLL (MSVC) that handles DirectStorage + D3D12 + CUDA interop, with Go bindings loaded via syscall (no CGO), integrated into Ollama's Backend.Load(). Double-buffered staging with D3D12 fences imported as CUDA external semaphores. On codestral (12.6 GB, 57 layers), it loads 4.1x faster than stock Ollama — the advantage grows with model size because standard I/O depends on OS page cache.

Note: the weights still need VRAM and RAM — DirectStorage changes the transfer path, not where the weights end up. The win is that DMA doesn't depend on the OS cache being warm.

*The MoE work:* Built full expert streaming — CUDA VMM for sparse-resident pools, lazy physical allocation, on-demand SSD→GPU streaming during Forward(), one-token-lag exact routing (use token t's expert indices to prefetch for t+1), LRU eviction. Ran qwen3:30b (128 experts/layer, 8 active) on 40GB RAM + 8GB VRAM. Pipeline sustains ~1.9 GB/s.

*Where it breaks:* Both models tested (gpt-oss:20b, qwen3:30b) are temporally dense. Over ~50 tokens, every expert gets touched. Reducing cache capacity by 25% causes >1000 faults/token. The temporal working set equals the full model.

The hardest bugs were: (1) Windows DLL search order differences between EXE and DLL contexts causing E_NOTIMPL, (2) D3D12 picking Intel iGPU while CUDA was on NVIDIA dGPU (LUID matching fixed it), (3) D3D12 fence completion not establishing memory visibility for CUDA — had to import the fence as a CUDA external semaphore.

The evaluation harness (max_resident_per_layer, faulted_experts_per_token) is probably the most useful piece — it can immediately tell you if a new MoE model is temporally sparse enough for small-VRAM inference. If anyone knows of MoE models trained with temporal locality objectives, I'd love to test them.

Repos: - https://github.com/kibbyd/llm_upper (research & docs) - https://github.com/kibbyd/llm_upper_ollama (Ollama fork) - Full writeup: https://github.com/kibbyd/llm_upper/blob/main/PROJECT_RECORD...