frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
529•klaussilveira•9h ago•146 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
860•xnx•15h ago•519 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
72•matheusalmeida•1d ago•13 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
180•isitcontent•9h ago•21 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
182•dmpetrov•10h ago•79 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
294•vecti•11h ago•130 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
69•quibono•4d ago•13 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
343•aktau•16h ago•168 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
338•ostacke•15h ago•90 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
434•todsacerdoti•17h ago•226 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
237•eljojo•12h ago•147 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
13•romes•4d ago•2 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
373•lstoll•16h ago•252 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
6•videotopia•3d ago•0 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
41•kmm•4d ago•3 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•2 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
220•i5heu•12h ago•162 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
91•SerCe•5h ago•75 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
62•phreda4•9h ago•11 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
162•limoce•3d ago•82 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
38•gfortaine•7h ago•11 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
127•vmatsiiako•14h ago•53 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
18•gmays•4h ago•2 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
261•surprisetalk•3d ago•35 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1029•cdrnsf•19h ago•428 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
55•rescrv•17h ago•18 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
83•antves•1d ago•60 comments

WebView performance significantly slower than PWA

https://issues.chromium.org/issues/40817676
18•denysonique•6h ago•2 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
5•neogoose•2h ago•1 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
109•ray__•6h ago•54 comments
Open in hackernews

Compression Dictionary Transport

https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Compression_dictionary_transport
99•todsacerdoti•7mo ago

Comments

o11c•7mo ago
That `Link:` header broke my brain for a moment.
Y-bar•7mo ago

    Available-Dictionary: :    =:
It seems very odd to use a colon as starting and ending delimiter when the header name is already using a colon. Wouldn’t a comma or semicolon work better?
judofyr•7mo ago
It’s encoded using the spec that binary data in headers should be enclosed by colons: https://www.rfc-editor.org/rfc/rfc8941.html#name-byte-sequen...
Y-bar•7mo ago
Oh, thanks, it looked like a string such as a hash or base64 encoded data, not binary. Don’t think I have ever seen a use case for binary data like this in a header before.
divbzero•7mo ago
This seems like a lot of added complexity for limited gain. Are there cases where gzip and br at their highest compression levels aren’t good enough?
pmarreck•7mo ago
Every piece of information or file that is compressed sends a dictionary along with it. In the case of, say, many HTML or CSS files, this dictionary data is likely nearly completely redundant.

There's almost no added complexity since zstd already handles separate compression dictionaries quite well.

pornel•7mo ago
The standard compressed formats don't literally contain a dictionary. The decompressed data becomes its own dictionary while its being decompressed. This makes the first occurrence of any pattern less efficiently compressed (but usually it's still compressed thanks to entropy coding), and then it becomes cheap to repeat.

Brotli has a default dictionary with bits of HTML and scripts. This is built in into the decompressor, and not sent with the files.

The decompression dictionaries aren't magic. They're basically a prefix for decompressed files, so that a first occurrence of some pattern can be referenced from the dictionary instead of built from scratch. This helps only with the first occurrences of data near the start of the file, and for all the later repetitions the dictionary becomes irrelevant.

The dictionary needs to be downloaded too, and you're not going to have dictionaries all the way down, so you pay the cost of decompressing the data without a dictionary whether it's a dictionary + dictionary-using-file, or just the full file itself.

yorwba•7mo ago
> The dictionary needs to be downloaded too

Which is why the idea is to use a previous version of the same file, which you already have cached from a prior visit to the site. You pay the cost of decompressing without a dictionary, but only on the first visit. Basically it's a way to restore the benefits of caching for files that change often, but only a little bit each time.

zvr•7mo ago
Of course, the Brotli default (built-in) dictionary is infamous for containing such strings like "Holy Roman Emperor", "Confederate States", "Dominican Republic", etc., due to the way it was created. One can see the whole dictionary in https://gist.github.com/duskwuff/8a75e1b5e5a06d768336c8c7c37....

Having a dictionary created by actual content to be compressed will end up with a very different dictionary.

pmarreck•7mo ago
> The dictionary needs to be downloaded too, and you're not going to have dictionaries all the way down

We already have a way to manage this: Standardizing and versioning dictionaries for various media types (also with a checksum), and then just caching them locally forever, since they should be immutable by design.

To prevent an overgrowth of dictionaries with small differences, we could require each one to be an RFC.

bsmth•7mo ago
If you're shipping a JS bundle, for instance, that has small, frequent updates, this should be a good use case. There's a test site here that accompanies the explainer which looks interesting for estimates: https://use-as-dictionary.com/generate/
ks2048•7mo ago
Some examples here: https://github.com/WICG/compression-dictionary-transport/blo...

show significant gain of using dictionary over compressed w/o dictionary.

It seems like instead of sites reducing bloat, they will just shift the bloat to your hard-drive. Some of the examples said dictionary of 1MB which doesn't seem big, but could add up if everyone is doing this.

sltkr•7mo ago
That demonstrates how useless this is. It only shaves off kilobytes on extremely bloated sites that waste megabytes of data.

For example, take the CNN example:

> The JavaScript was 98% smaller using the previous version as a dictionary for the new version than if the new version was downloaded with brotli alone. Specifically, the 278kb JavaScript was 90kb with brotli alone and 2kb when using brotli and the previous version as a dictionary.

Oh wow! 98% savings! That's amazing! Except in absolute terms the difference between 90 KB and 2 KB is only 88 KB. Meanwhile, cnn.com pulls in 63.7 MB of data just on the first page load. So in reality, that 88 KB saved was less than 0.14% of the total data, which is negligible.

yorwba•7mo ago
What makes you think this would stop working if applied to 63.7 MB of JavaScript instead of just one file?
wat10000•7mo ago
In some applications, there’s no “good enough,” even small gains help and can be significant when multiplied across a large system. It’s like the software version of American Airlines saving $40,000/year by removing one olive from their salads.
bhaney•7mo ago
Cloudflare and similar services seem well positioned to take advantage of this.

Analyze the most common responses of a website on their platform, build an efficient dictionary from that data, and then automatically inject a link to that site-specific dictionary so future responses are optimally compressed and save on bandwidth. All transparent to the customers and end users.

pornel•7mo ago
Per-URL dictionaries (where a URL is its own dictionary) are great, because they allow updating to a new version of a resource incrementally, and an old version of the same resource is the best template, and there's no extra cost when you already have it.

However, I'm sceptical about usefulness of multi-page shared dictionaries (where you construct one for a site or group of pages). They're a gamble that can backfire.

The extra dictionary needs to be downloaded, so it starts as an extra overhead. It's not enough for it to just match something. It has to beat regular (per-page) compression to be better than nothing, and it must be useful enough to repay its own cost before it even starts being a net positive. This basically means everything in the dictionary must be useful to a user, and has to be used more than once, otherwise it's just an unnecessary upfront slowdown.

Standard (per-page) compression is already very good at removing simple repetitive patterns, and Brotli even comes with a default built-in dictionary of random HTML-like fragments. This further narrows down usefulness of the shared dictionaries, because generic page-like content is enough to be an advantage. They need to contain more specific content to beat standard compression, but the more specific the dictionary is, the lesser the chance of it fitting what the user browses.

creatonez•7mo ago
Excited to see access control mishaps where the training data includes random data from other users
mlhpdx•7mo ago
This seems very interesting for APIs where clients have chatty and long lived connections. I’m thinking about the GitHub API, for example.
everfrustrated•7mo ago
No doubt someone will figure out how to abuse this into yet another cookie/tracking technology.
CottonMcKnight•7mo ago
If this interests you, I highly recommend watching this talk by Pat Meenan.

https://www.youtube.com/watch?v=Gt0H2DxdAPY

londons_explore•7mo ago
Seems like this would result in quite a lot of increased server load.

Previously servers would cache compressed versions of your static resources.

Whereas now they either have to compress on-the-fly or have a massive cache of not only your most recent static JavaScript blob, but also all past blobs and versions compressed using different combinations of them as a dictionary.

This could easily 10x resources needed for serving static html/CSS/js.

magicalist•7mo ago
The past versions stored clientside are the dictionaries. Serverside, just keep the diffs against, say, the last five versions around if storage is an issue, or whatever gets you some high percentage of returning clients, then rebuild when pushing a new release.
toast0•7mo ago
Presumably you'd generate a standalone compressed form (or forms) as usual, and also compressed forms using several dictionaries.

Then the server is doing more work at request time, but it's not meaningfully more work --- just checking if the request path has a dictionary compressed form that matches the dictionary hash provided by the client.

longhaul•7mo ago
Why can’t browsers/servers just store a standard English dictionary and communicate via indexes?. Anything that isn’t in the dictionary can be sent raw. I’ve always had this thought but don’t see why it isn’t implemented. Might get a bit more involved with other languages but the principle remains the same.

Thinking about it a bit more, we are doing this at the character level- a Unicode table, so why can’t we lookup words or maybe even common sentences ?

wmf•7mo ago
Brotli has a built-in dictionary.
Svetlitski•7mo ago
Compression algorithms like Brotli already do this:

https://www.rfc-editor.org/rfc/rfc7932#page-28

pornel•7mo ago
Compression is limited by the pigeonhole principle. You can't get any compression for free.

There's every possible text in Pi, but on average it's going to cost the same or more to encode the location of the text than the text itself.

To get compression, you can only shift costs around, by making some things take fewer bits to represent, at the cost of making everything else take more bits to disambiguate (e.g. instead of all bytes taking 8 bits, you can make a specific byte take 1 bit, but all other bytes will need 9 bits).

To be able to reference words from an English dictionary, you will have to dedicate some sequences of bits to them in the compressed stream.

If you use your best and shortest sequences, you're wasting them on picking from an inflexible fixed dictionary, instead of representing data in some more sophisticated way that is more frequently useful (which decoders already do by building adaptive dictionaries on the fly and other dynamic techniques).

If you try to avoid hurting normal compression and assign less valuable longer sequences of bits to the dictionary words instead, these sequences will likely end up being longer than the words themselves.

tareqak•7mo ago
Interesting idea. I wonder if there would be a way to do steganography here. That is, changing the message by using a different dictionary but with the same delta / same set of compression rules.

Allowing for changing the message obviously means that things like malware become a possibility.

https://en.wikipedia.org/wiki/Steganography