frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

People Keep Inventing Prolly Trees

https://www.dolthub.com/blog/2025-06-03-people-keep-inventing-prolly-trees/
58•lifty•2d ago

Comments

compressedgas•2d ago
This article does not mention Jumbostore (Kave Eshghi, Mark Lillibridge, Lawrence Wilcock, Guillaume Belrose, and Rycharde Hawkes) which used content defined chunking recursively on the chunk list of a content defined chunked file in 2007. This is exactly what a Prolly Tree is.
lawlessone•3h ago
Amazing! all these people reinvented my SuperMegaTree!
aboodman•2h ago
I was aware of this kind of structure when I coined 'prolly tree'. It's the same thing bup was doing, which I referenced in our design docs:

https://github.com/attic-labs/noms/blob/master/doc/intro.md#...

The reason I thought a new name was warranted is that a prolly tree stores structured data (a sorted set of k/v pairs, like a b-tree), not blob data. And it has the same interface and utility as a b-tree.

Is it a huge difference? No. A pretty minor adaptation of an existing idea. But still different enough to warrant a different name IMO.

ChadNauseam•2h ago
Haha, this is funny. I've been obsessed with rolling-hash based chunking since I read about it in the dat paper. I didn't realize there was a tree version, but it is a natural extension.

I have a related cryptosystem that I came up with, but is so obvious I'm sure someone else has invented it first. The idea is to back up a file like so: first, do a rolling-hash based chunking, then encrypt each chunk where the key is the hash of that chunk. Then, upload the chunks to the server, along with a file (encrypted by your personal key) that contains the information needed to decrypt each chunk and reassemble them. If multiple users used this strategy, any files they have in common would result in the same chunks being uploaded. This would let the server provider deduplicate those files (saving space), without giving the server provider the ability to read the files. (Unless they already know exactly which file they're looking for, and just want to test whether you're storing it.)

Tangent: why is it that downloading a large file is such a bad experience on the internet? If you lose internet halfway through, the connection is closed and you're just screwed. I don't think it should be a requirement, but it would be nice if there was some protocol understood by browsers and web servers that would be able to break-up and re-assemble a download request into a prolly tree, so I could pick up downloading where I left off, or only download what changed since the last time I downloaded something.

wakawaka28•2h ago
I think the cost of processing stuff that way would far exceed the cost of downloading the entire file again. You can already resume downloads from a byte offset if the server supports it, and that probably covers 99% of the cases where you would actually want to resume a download of a single file. Partial updates are rarely possible for large files anyway, as they are often compressed. If the host wants to make partial updates make sense then they could serve over rsync.
nicoburns•1h ago
Bittorrent is the protocol you're looking for. Unfortunately not widely adopted for the use cases you are talking about.
theLiminator•1h ago
Sounds similar to IPFS.
Retr0id•1h ago
> If you lose internet halfway through, the connection is closed and you're just screwed. [...] it would be nice if there was some protocol understood by browsers and web servers

HTTP Range Requests solve this without any clever logic, if mutually supported.

RainyDayTmrw•7m ago
AES-GCM-SIV[1] does something similar to your per chunk derived key, except that AES-GCM-SIV expects the key to be user-provided, and the IV is synthetic - hence Synthetic IV mode.

What's your threat model? This has "interesting"[3] properties. For example, given a file, the provider can figure out who has the file. Or, given a file, an arbitrary user can figure out if some other user already has the file. Users may even be able to "teleport" files to each other, like the infamous Dropbox Dropship[2].

I suspect why no one has tried this is many-fold: (1) Most providers want to store plaintext. Those few providers who don't want to store plaintext, whether for secrecy or deniability reasons, also don't want to store anything else correlatable, either. (2) Space is cheap. (3) Providers like being able to charge for space. Since providers sell space at a markup, they almost want you to use more space, not less.

[1]: https://en.wikipedia.org/wiki/AES-GCM-SIV [2]: https://en.wikipedia.org/wiki/Dropship_(software) [3]: "Interesting" is not a word you want associated with your cryptography usage, to say the least.

iamwil•1h ago
Anyone know if editing a prolly tree requires reconstructing the entire tree from the leaves again? All the examples I've ever seen in a wild reconstruct from the bottom up. Presumably, you can leave the untouched leaves intact, and the reconstruct parent nodes whose hashes have changed due to the changed leaves. I ended up doing an implementation of this, and wondered if it's of any interest or value to others?

Xfinity using WiFi signals in your house to detect motion

https://www.xfinity.com/support/articles/wifi-motion
396•bearsyankees•8h ago•272 comments

Rust CLIs with Clap

https://tucson-josh.com/posts/rust-clap-cli/
34•rajman187•2h ago•24 comments

Proton joins suit against Apple for practices that harm developers and consumers

https://proton.me/blog/apple-lawsuit
185•moose44•9h ago•222 comments

I write type-safe generic data structures in C

https://danielchasehooper.com/posts/typechecked-generic-c-data-structures/
234•todsacerdoti•10h ago•91 comments

The new skill in AI is not prompting, it's context engineering

https://www.philschmid.de/context-engineering
447•robotswantdata•6h ago•240 comments

YouTube No Translation

https://addons.mozilla.org/en-GB/firefox/addon/youtube-no-translation/
45•doener•1d ago•2 comments

There are no new ideas in AI only new datasets

https://blog.jxmo.io/p/there-are-no-new-ideas-in-ai-only
333•bilsbie•13h ago•170 comments

People Keep Inventing Prolly Trees

https://www.dolthub.com/blog/2025-06-03-people-keep-inventing-prolly-trees/
58•lifty•2d ago•10 comments

Claude Code now supports Hooks

https://docs.anthropic.com/en/docs/claude-code/hooks
118•ramoz•3h ago•41 comments

The hidden JTAG in a Qualcomm/Snapdragon device’s USB port

https://www.linaro.org/blog/hidden-jtag-qualcomm-snapdragon-usb/
128•denysvitali•9h ago•22 comments

Melbourne man discovers extensive model train network underneath house

https://www.sbs.com.au/news/article/i-was-shocked-melbourne-mans-unbelievable-find-after-buying-house/m4sksfer8
146•cfcfcf•3h ago•39 comments

Donkey Kong Country 2 and Open Bus

https://jsgroth.dev/blog/posts/dkc2-open-bus/
202•colejohnson66•12h ago•49 comments

So you want to serialize some DER?

https://alexgaynor.net/2025/jun/20/serialize-some-der/
34•lukastyrychtr•3d ago•4 comments

GPEmu: A GPU emulator for rapid, low-cost deep learning prototyping [pdf]

https://vldb.org/pvldb/vol18/p1919-wang.pdf
28•matt_d•5h ago•3 comments

End of an Era

https://www.erasmatazz.com/personal/self/end-of-an-era.html
98•marcusestes•8h ago•26 comments

Jim Boddie codeveloped the first successful DSP at Bell Labs

https://spectrum.ieee.org/dsp-pioneer-jim-boddie
26•jnord•5h ago•0 comments

The original LZEXE (A.K.A. Kosinski) compressor source code has been released

https://clownacy.wordpress.com/2025/05/24/the-original-lzexe-a-k-a-kosinski-compressor-source-code-has-been-released/
64•elvis70•8h ago•3 comments

Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktoken

https://github.com/M4THYOU/TokenDagger
253•matthewolfe•15h ago•70 comments

Publishing Pepys

https://literaryreview.co.uk/publishing-pepys
3•pepys•2d ago•0 comments

Entropy of a Mixture

https://cgad.ski/blog/entropy-of-a-mixture.html
35•cgadski•6h ago•3 comments

They don't make 'em like that any more: Sony DTC-700 audio DAT player/recorder

https://kevinboone.me/dtc-700.html
80•naves•9h ago•67 comments

Ask HN: What Are You Working On? (June 2025)

371•david927•1d ago•1149 comments

Show HN: New Ensō – first public beta

https://untested.sonnet.io/notes/new-enso-first-public-beta/
224•rpastuszak•16h ago•81 comments

Price of rice in Japan falls below ¥4k per 5kg

https://www.japantimes.co.jp/news/2025/06/24/japan/japan-rice-price-falls-below-4000/
84•PaulHoule•7h ago•138 comments

Creating fair dice from random objects

https://arstechnica.com/science/2025/05/your-next-gaming-dice-could-be-shaped-like-a-dragon-or-armadillo/
34•epipolar•2d ago•18 comments

14.ai (YC W24) hiring founding engineers in SF to build a Zendesk alternative

https://14.ai/careers
1•michaelfester•10h ago

The Email Startup Graveyard: Why 80%+ of Email Companies Fail

https://forwardemail.net/en/blog/docs/email-startup-graveyard-why-80-percent-email-companies-fail
52•skeptrune•3h ago•12 comments

Public Signal Backups Testing

https://community.signalusers.org/t/public-signal-backups-testing/69984
38•blendergeek•7h ago•13 comments

A CarFax for Used PCs; Hewlett Packard wants to give old laptops new life

https://spectrum.ieee.org/carmax-used-pcs
66•rubenbe•11h ago•73 comments

Harvest Move – A game that requires careful movement

https://jslegend.itch.io/harvest-move
8•JSLegendDev•2d ago•1 comments