frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

WebSocket+Huffman vs. SSE+JSON for streaming LLM tokens

https://github.com/vidur2/token_entropy_encoder
1•vidur2•3h ago

Comments

vidur2•3h ago
I built a proof-of-concept that streams LLM tokens as Huffman-compressed binary over WebSocket instead of JSON text over SSE.

The Problem: Current LLM APIs (OpenAI, Anthropic, self-hosted) send decoded text wrapped in JSON. For every token, you get something like: `data: {"choices":[{"delta":{"content":"hello"}}]}`. This is verbose, wastes bandwidth, and forces the server to decode tokens to text (CPU cost).

The Solution: Stream raw token IDs as binary. The server sends Huffman-compressed token IDs over WebSocket, and the client decodes them locally using WASM. This offloads token decoding from server to client.

Results from mock benchmarks: - 30% faster for inline completions (the critical vibecoding use case) - 25% faster for small completions (100 tokens) - 12% faster overall average - ~60% bandwidth savings (3 bytes/token vs 8 bytes/token) - Client-side decoding means servers can handle more concurrent users

Architecture:

LLM → Token IDs → Huffman encode → WebSocket (binary) → WASM decode → Text vs. LLM → Token IDs → Decode to text → JSON → SSE (HTTP) → Parse → Text

Tech Stack: Rust (WASM for encoder/decoder), TypeScript (test harness), Node.js (mock servers). Includes comprehensive benchmarks comparing both protocols on identical workloads.

Limitations: - Requires modifying the LLM server to expose token IDs (standard APIs don't do this) - Tokenizer is baked in at build time (`./build.sh <tokenizer_name>`) - can't switch models dynamically - Mock server only - no real LLM integration yet - VS Code extension is non-functional (command registration issues) - Best for self-hosted deployments where you control the stack

The VS Code extension code is included but doesn't work. Benchmarks and Node.js examples demonstrate the approach.

Why it matters: - Protocol-level thinking for LLM APIs (not just server scaling) - Shows binary protocols + client-side decoding beats traditional HTTP/JSON - Opens discussion about whether LLM APIs should expose token IDs

Built this in ~3K LOC. Fully open source (MIT). Includes comprehensive benchmarks and Node.js examples.

Try it: https://github.com/vidur2/token_entropy_encoder

Looking for feedback on the approach, potential issues, and whether this is worth pursuing further!

Big Sleep Tracker: Google Project Zero + Google DeepMind find security bugs

https://issuetracker.google.com/savedsearches/7155917
1•guessmyname•55s ago•0 comments

Suggestion Regarding References to the Prophet Muhammad (Peace Be Upon Him)

1•naseerwafa•1m ago•0 comments

Show HN: Career AutoPilot – AI guidance for navigating your career

https://www.careerautopilot.ai
1•bvikasgupta•1m ago•0 comments

Can a wealthy family change the course of a deadly brain disease?

https://www.science.org/content/article/can-wealthy-family-change-course-deadly-brain-disease
1•Snoozus•5m ago•0 comments

Show HN: Contd makes interactive CLIs usable for agents in an async way

https://github.com/werifu/contd
1•wefchen•6m ago•0 comments

Hitting the High Notes (2005)

https://www.joelonsoftware.com/2005/07/25/hitting-the-high-notes/
1•benatkin•11m ago•0 comments

Show HN: What zero-intervention E2E test generation looks like

https://www.youtube.com/watch?v=G6mtaC15ocw
1•nadeem1•12m ago•0 comments

Neolab and Emerging AI Lab Tracker

https://cleverhack.com/neolab-and-emerging-ai-lab-tracker
1•jxmorris12•14m ago•0 comments

"Clinejection" Turned an AI Bot into a Supply Chain Attack

https://snyk.io/blog/cline-supply-chain-attack-prompt-injection-github-actions/
1•vismit2000•17m ago•0 comments

Show HN: Managed S3 exports for billing data (no AWS setup required)

https://flexprice.io/
3•manishfp•20m ago•0 comments

Coruna: The Mysterious Journey of a Powerful iOS Exploit Kit

https://cloud.google.com/blog/topics/threat-intelligence/coruna-powerful-ios-exploit-kit
1•mitchbob•22m ago•0 comments

Vibe Security Radar – Tracking the security cost of vibe coding

https://vibe-radar-ten.vercel.app
1•guessmyname•26m ago•0 comments

Spark Runner: Easily Automate Front End Tests

https://github.com/simonarthur/spark-runner/
1•chromaton•29m ago•1 comments

I built this privacy-focused analytics tool

1•webanalyzerapp•29m ago•0 comments

"Game Development in Eight Bits" by Kevin Zurawel (2021) [video]

https://www.youtube.com/watch?v=TPbroUDHG0s
1•vinhnx•31m ago•0 comments

open_slate: A Powerful and Private 2-in-1 Tablet

https://www.indiegogo.com/en/projects/braxtechnologies/open_slate
1•owenpalmer•32m ago•0 comments

Converting Binary Floating-Point Numbers to Shortest Decimal Strings

https://onlinelibrary.wiley.com/doi/10.1002/spe.70056
1•matt_d•34m ago•0 comments

The era of Doctor AI is here

https://www.axios.com/2026/03/06/ai-doctor-health-information-consumers
2•0in•35m ago•0 comments

Show HN: Context-compact – Summarize agent context instead of truncating it

https://github.com/HalfEmptyDrum/Context-Compactor
6•EmptyDrum•35m ago•2 comments

Coding Agents in Feb 2026

https://calv.info/agents-feb-2026
1•vinhnx•44m ago•0 comments

Calif. lawsuit accuses Meta of sending nude video from AI glasses to workers

https://www.sfgate.com/tech/article/meta-ai-glasses-lawsuit-21960004.php
2•bryan0•44m ago•0 comments

Anthropic and The Pentagon

https://www.schneier.com/blog/archives/2026/03/anthropic-and-the-pentagon.html
1•herbertl•44m ago•0 comments

Show HN: Crypto data API where AI agents pay per request with USDC (x402)

https://crypto-enrich.up.railway.app
1•psamala•49m ago•0 comments

The first AI counter surveillance app

https://play.google.com/store/apps/details?id=app.sentryrf&hl=en_US
2•vidoluc•50m ago•1 comments

Loop Conference Channel [YouTube]

https://www.youtube.com/channel/UC_QIfHvN9auy2CoOdSfMWDw
1•vinhnx•51m ago•0 comments

The Mystery of Asjo.org

https://acid.vegas/blog/the-mystery-of-asjo-org/
1•gzread•53m ago•0 comments

How College Admissions Officers Spot Over-Coached Applications

https://www.forbes.com/sites/christopherrim/2026/02/27/how-college-admissions-officers-spot-over-...
2•paulpauper•54m ago•0 comments

Our Hospice System Subverts the Point of Hospice Care

https://www.nytimes.com/2026/03/02/opinion/hospice-care.html
2•paulpauper•54m ago•0 comments

SEIU Delenda Est

https://www.astralcodexten.com/p/seiu-delenda-est
3•paulpauper•56m ago•0 comments

Tell HN: Azure Data Factory pipeline execution delays in East US 2

1•dwoldrich•57m ago•0 comments