frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Tq-KV – Rust implementation of TurboQuant that works on GGUF models

2•onurgokyildiz•1h ago
TurboQuant came out at ICLR on March 25. We tried every available implementation on GGUF models. None of them produced usable output. Perplexity goes from 5.18 to 3,556. The model starts mixing languages mid-sentence, hallucinating citations, losing coherence entirely. It's compound quantization error. GGUF models already have quantized weights. Quantize the KV cache on top of that, and the errors multiply through softmax. Nobody was handling this. So we wrote our own from scratch. 13.7K lines of Rust, 86 tests. The first thing we tried was embarrassingly simple: keep sink tokens in FP16 (they anchor attention), skip quantizing the current token, and hard-reset cache between conversations. We called it the 3-Fix Framework mostly as a joke because each fix is so obvious. But together they take PPL from 3,556 to 6.07. The cache reset was the one that surprised us -- turns out quantization errors were silently accumulating across conversations, and the model would drift after a few turns. The bigger win was compressing keys before rotary position embedding instead of after. Post-RoPE keys have position-dependent statistics that break the Gaussian assumption Lloyd-Max codebooks rely on. Pre-RoPE keys don't have this problem. PPL gap drops from +17% to +3.7%. We almost didn't try this because it meant restructuring the entire compression pipeline. Glad we did. We also integrated KV Compaction (Zweiger 2026), which is orthogonal to bit compression -- TurboQuant reduces bits per token, Compaction reduces number of tokens. You select important keys using attention scores from all GQA-mapped query heads, fit biases to preserve the softmax partition function, then solve for synthetic values via ridge regression. Our first attempt used a single reference query and performed terribly. Switching to all query heads with mean-based scoring fixed it -- PPL went from 5.78 to 2.23 on the same config. Combined with TurboQuant: up to 25x effective compression. On the systems side, we replaced dense QJL with a structured Hadamard variant that's 115x faster and somehow also better quality (+4.5 dB SNR -- we still don't fully understand why). Fused attention computes scores directly from compressed indices using AVX2+FMA centroid lookup (6-8.9x speedup). And we built an append-only O(1) incremental cache, because the naive approach recompresses the entire cache on every token, which at 4K context means 935x overhead. Numbers on Qwen 2.5 7B Q4_K_M at 4-bit: +3.7% PPL delta with Pre-RoPE (other impls give +17% at best with our 3-Fix, or 3,556 without). 9/9 needle-in-a-haystack. 7.5-14.2x VRAM reduction. With compaction, effective compression reaches 25x. Tested across Llama-3, Qwen2.5, Gemma, Mistral, and Phi-3. There's also tq-engine on top of this -- model hub, HTTP API, calibration pipeline. Basically a Rust Ollama with KV compression built in. On crates.io: https://crates.io/crates/tq-kv gh: https://github.com/onur-gokyildiz-bhi/tq-kv

Axios NPM Package Supply Chain Hack

https://www.bleepingcomputer.com/news/security/hackers-compromise-axios-npm-package-to-drop-cross...
1•SpyKiIIer•35s ago•0 comments

Almighty Lisp

https://almightylisp.com/
1•reikonomusha•50s ago•0 comments

Claude Code in Rust, Python, Go, Open source

https://github.com/SeifBenayed/claude-code-sdk
1•seifbenayed1992•6m ago•0 comments

New Patches Allow Building Linux IPv6-Only, Option to Deprecate "Legacy" IPv4

https://www.phoronix.com/news/Linux-IPv6-IPv4-Legacy-Knobs
2•Bender•6m ago•0 comments

A BASIC interpreter in Swift for Apple's birthday

https://github.com/jpurnell/ApplesoftBASIC
1•jpurnell•7m ago•1 comments

Google Paper Warns of Quantum Computing Risk for Bitcoin

https://www.wsj.com/livecoverage/stock-market-today-dow-sp-500-nasdaq-03-31-2026/card/google-pape...
1•bookofjoe•7m ago•0 comments

Why AI systems improve while drifting away from reality [pdf]

https://github.com/therealitydrift/reality-drift-library/blob/main/Reality%20Drift%20Project/07_O...
1•realitydrift•8m ago•1 comments

Review: Measuring AI Ability to Complete Long Software Tasks

https://emptysqua.re/blog/review-measuring-ai-ability-to-complete-long-software-tasks/
1•swq115•9m ago•0 comments

Is BGP Safe Yet? No. Test Your ISP

https://isbgpsafeyet.com/
3•janandonly•11m ago•0 comments

Refactoring Is Not Heroism – An Information-Theoretic Proof

https://github.com/HeinrichvH/articles/blob/main/building-with-ai/01-entropy-cycle.md
1•HeinrichAQS•11m ago•0 comments

TSMC plans 3-nanometre chip production launch in Japan in 2028

https://www.reuters.com/business/autos-transportation/tsmc-plans-3-nanometre-chip-production-laun...
2•voxadam•12m ago•0 comments

Show HN: OpenHarness Open-source terminal coding agent for any LLM

https://github.com/zhijiewong/openharness
2•wangzhijie•12m ago•0 comments

Office air quality may affect employees' cognition, productivity

https://hsph.harvard.edu/news/office-air-quality-may-affect-employees-cognition-productivity/
1•bilsbie•12m ago•0 comments

Nearly a decade building a VR studio: Why I left Unity, what I found in Unreal

https://retrorecordingsxr.com/devlog/nearly-a-decade-building-a-vr-studio-why-i-left-unity-and-wh...
1•marald•12m ago•0 comments

'Terrible pollution': the reality of the US gas sites rated 'grade A'

https://www.theguardian.com/environment/2026/apr/01/invisible-plumes-and-terrible-pollution-the-r...
1•mitchbob•13m ago•0 comments

RFK Jr. wants Americans to use peptides that were banned over safety risks

https://arstechnica.com/health/2026/03/rfk-jr-s-fda-expected-to-lift-restrictions-on-risky-unprov...
1•Bender•13m ago•0 comments

Coreutils: A Comprehensive Review (2023)

https://ratfactor.com/slackware/pkgblog/coreutils
1•swq115•14m ago•0 comments

NASA is leading the way to the Moon, but the military won't be far behind

https://arstechnica.com/space/2026/03/nasa-is-leading-the-way-to-the-moon-but-the-military-wont-b...
1•Bender•14m ago•0 comments

Isomorphic Layout Composer – Microservice architecture on the front-end

https://ilc.namecheap.technology/
1•h4ch1•16m ago•0 comments

Alcazar Security: Dead man's switch and digital legacy

https://alcazarsec.com/
1•alcazar•17m ago•0 comments

Matt Mullenweg Calls for WordPress 7.0 Delay to Introduce Database Table for RTC

https://www.therepository.email/matt-mullenweg-calls-for-wordpress-7-0-delay-to-introduce-databas...
1•rpgbr•18m ago•0 comments

AI Policy

https://dbushell.com/ai/
1•frizlab•19m ago•0 comments

Cookbooks for Aliens

https://www.quarter--mile.com/Cookbooks-for-Aliens
1•surprisetalk•19m ago•0 comments

Backstage Is Dead

https://newsletter.port.io/p/backstage-is-dead
1•gpi•20m ago•0 comments

Matt Miller: Tech sovereignty is 'welfare' for weak startups

https://sifted.eu/articles/matt-miller-evantic-sequoia-stormzy-interview
1•matthieu_bl•20m ago•0 comments

Mercor Is Compromised

https://twitter.com/mercor_ai/status/2039101905675403306
2•stikit•21m ago•0 comments

Applying Federal Digital Communications Privacy Law to Hosted AI Models

https://cyberlaw.stanford.edu/blog/2026/03/applying-federal-digital-communications-privacy-law-to...
1•iamnothere•22m ago•0 comments

Claude Code ranks 39th on terminal bench. The leaked source shows why

https://thoughts.jock.pl/p/claude-code-source-leak-what-to-learn-ai-agents-2026
2•joozio•22m ago•0 comments

CEO of largest public hospital says he's ready to replace radiologists with AI

https://radiologybusiness.com/topics/artificial-intelligence/ceo-americas-largest-public-hospital...
8•thunderbong•22m ago•5 comments

Claude Code leak reveals a persistent background agent 'KAIROS'

https://firethering.com/claude-code-source-code-leak/
1•steveharing1•22m ago•0 comments