frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•22s ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
1•Brajeshwar•4m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
2•Brajeshwar•4m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
1•Brajeshwar•4m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•8m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•11m ago•0 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•12m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•12m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
2•vinhnx•13m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•17m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•22m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•26m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•28m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•28m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
5•okaywriting•35m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•38m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•38m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•39m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•40m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•40m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•41m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•41m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•46m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•46m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•47m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•47m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•55m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•55m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
2•surprisetalk•58m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•58m ago•0 comments
Open in hackernews

Lessons from Hash Table Merging

https://gist.github.com/attractivechaos/d2efc77cc1db56bbd5fc597987e73338
85•attractivechaos•1mo ago

Comments

willvarfar•1mo ago
Kudos, neat digging and writeup that makes us think :)

If you merge linear probed tables by iterating in sorted hash order then you are matching the storage order and can congest particular parts of the table and cause the linear probing worse case behaviour.

By changing the iteration order, or salting the hash, you can avoid this.

Of course chained hash tables don't suffer from this particular problem.

My quick thought is that hash tables ought keep an internal salt hidden away. This seems good to avoid 'attacks' as well as speeding up merging etc. The only downside I can think of is that the creation of the table needs to fetch a random salt that might not be quick, although that can alleviated by allowing it to be set externally in the table creation so people who don't care can set it to 0 or whatever. What am I missing?

kzrdude•1mo ago
Having a per-table key for the hash function is what siphash authors propose and what many do to combat dos attacks right? For example Rust's default HashMap.

The keys are hidden/secret to the system external to the application.

oleggromov•1mo ago
There's a typo with 'ULL' string suffixes in the hexadecimal numbers in the first code example.
rurban•1mo ago
No, 0xd6e8feb86659fd93ULL is a valid unsigned long long number. With stdint.h you'd get portable suffix macros, which would help on non-Windows, but they do look worse.
oleggromov•4w ago
Oh wow, my apology - didn't know that and didn't notice the length of the hexademical number. TIL.
tialaramex•1mo ago
In Rust, don't do this, it's more work and it'll tend to be slower, often much slower.

HashMap implements Extend, so just h0.extend(h1) and you're done, the people who made your HashMap type are much better equipped to optimize this common operation.

In a new enough C++ in theory you might find the same functionality supported, but Quality of Implementation tends to be pretty frightful.

OskarS•1mo ago
> HashMap implements Extend, so just h0.extend(h1) and you're done, the people who made your HashMap type are much better equipped to optimize this common operation.

Are you sure? I'm not very used to reading Rust stdlib, but this seems to be the implementation of the default HashMap extend [1]. It just calls self.base.extend. self.base seems to be hashbrown::hash_map, and this is the source for it's extend [2]. In other words, does exactly the same thing, just iterates through hash map and inserts it.

Maybe I'm misreading something going through the online docs, or Rust does the "random seed" thing that abseil does, but just blinding assuming something doesn't happen "because Rust" is a bit silly.

[1]: https://doc.rust-lang.org/src/std/collections/hash/map.rs.ht...

[2]: https://docs.rs/hashbrown/latest/src/hashbrown/map.rs.html#4...

tialaramex•1mo ago
Yes, HashMap will by default be randomly seeded in Rust, but also the code you linked intelligently reserves capacity. If h0 is empty, it reserves enough space for all of h1, and if it isn't then it reserves enough extra space for half of h1, which turns out to be a good compromise.

Note that the worst case is we ate a single unneeded growth, while the best case is that we avoided N - 1 grows where N may be quite large.

attractivechaos•4w ago
First of all, as khuey pointed out, the current implementation accumulates values. extend() replaces values instead. It wouldn't achieve the same functionality.

I tried extend() anyway. It didn't work well. Based on your description, extend() implements a variation of preallocation (i.e. Solution II). However, because it doesn't always reserve enough space to hold the merged hash table, clustering still happens depending on N. I have updated the rust implementation (with the help of LLM as I am not a good rust programmer). You can try it yourself with "ht-merge-rust 1 -e -n14m" or point out if I made mistakes.

> HashMap will by default be randomly seeded in Rust

Yes, so it is with Abseil. The default rust hash functions, siphash in the standard library and foldhash in hashbrown, are ~3X as slow in comparison to simple hash functions on pure insertion load. When performance matters, we will use faster hash functions at least for small keys and will need a solution from my post.

> In a new enough C++ in theory you might find the same functionality supported, but Quality of Implementation tends to be pretty frightful.

This is not necessary. The rust libraries are a port of Abseil, a C++ library. Boost is as fast as Rust. Languages/libraries should learn from each other, not fight each other.

tialaramex•4w ago
> First of all, as khuey pointed out, the current implementation accumulates values. extend() replaces values instead. It wouldn't achieve the same functionality.

Ah! Yes, I apologise. I missed the + in += and I'm not used to a hash table which defaults initialization for unseen entries (as the C++ hash tables all tend to because its native container types behave that way) so I wasn't looking for it.

The SipHash will be noticeably slower, no question about it, and so if you need to and know what you're paying you can replace the hash, including with integer_hasher which gives you what you'd likely know from many C++ stdlib implementations - an identity function presented as a hash.

> This is not necessary. The rust libraries are a port of Abseil, a C++ library.

More specifically HashBrown is a port [edited: actually a re-implementation I think, design->Rust not C++->Rust] of Abseil's Swiss Tables, and these days Rust's HashMap (and HashSet of course) use HashBrown but that's not what I was getting at here

I was thinking about analogues of Extend (because as I wrote above, I didn't notice that you're accumulating not overwriting) and modern C++ has this kind of feature in Ranges::to however it doesn't quite have Extend and as I said QoI is poor, there are often trivial optimisations that Rust does but the C++ means the same but isn't optimised.

I am interested in a quite different benchmark for hash tables, rather than merging I'm interested in very small hash tables. Clearly for two items it will be faster to try them both, and clearly for a million items trying them all is awful, so I measure a VecMap type (same API as a hash table but actually just the growable array of unordered key->value pairs, searched linearly) against HashMap and other implementations of this API.

For N=25 VecMap is still competitive, but even at N=5 if we use a very fast hash (such as that identity function) instead of SipHash we can beat VecMap for most operations. I suspect this sort of benchmark would fare very differently on older hardware (faster memory relative to ALU operations) and the direction of travel is likely to stay the same for the foreseeable future. In 1975 if you have six key->value pairs you don't want a hash table because it's too slow but in 2025 you probably do.

khuey•1mo ago
From skimming the source code it looks like the merge operation here adds the values for duplicated keys rather than replacing the first value with the second value so using HashMaps's Extend impl won't work.
tialaramex•4w ago
Thanks, I missed that
jesse__•4w ago
> the people who made your HashMap type are much better equipped to optimize ...

Who's to say I'm not the one making the hashtable? There are plenty of real-world reasons the standard library hashtable may be either inaccessible or unsuitable.

Furthermore, the idea that "oh, honey, it's too hard, smart people did it for you" is insufferable and needs to die. When I'm the one making something, I have dramatically more information about the problem I'm trying to solve than the author of a hashtable library, and am therefore much better equipped to make design decision tradeoffs.

Please stop perpetuating the idea that 'just use a library' is unilaterally the best option. Sometimes, it's not.

tialaramex•4w ago
If you made your own type, you should implement Extend. It seems you agree that in this case you are best placed to do a good job.

And indeed if you have your own custom operation you want, it may well make sense for you to implement it on both your own types and stdlib types.

jesse__•4w ago
Great, we can agree on those :)
exDM69•1mo ago
Maybe this would be a suitable application for "Fibonacci hashing" [0][1], which is a trick to assign a hash table bucket from a hash value. Instead of just taking the modulo with the hash table size, it first multiplies the hash with a constant value 2^64/phi where phi is the golden ratio, and then takes the modulo.

There may be better constants than 2^64/phi, perhaps some large prime number with roughly equal number of one and zero bits could also work.

This will prevent bucket collisions on hash table resizing that may lead to "accidentally quadratic" behavior [2], while not requiring rehashing with a different salt.

I didn't do detailed analysis on whether it helps on hash table merging too, but I think it would.

[0] https://probablydance.com/2018/06/16/fibonacci-hashing-the-o... [1] https://news.ycombinator.com/item?id=43677122 [2] https://accidentallyquadratic.tumblr.com/post/153545455987/r...

SkiFire13•1mo ago
> This will prevent bucket collisions on hash table resizing

Fibonacci hashing is really adding another multiplicative hashing step followed by dropping the bottom bits using a shift operation instead of the top bits using an and operation. Since it still works by dropping bits, items that were near before the resize will still be near after the resize and it won't really change anything.

attractivechaos•1mo ago
Exactly. And khashl uses Fibonacci hashing. Without salting, it has the same problem.
SkiFire13•1mo ago
> I evaluated the following hash table libraries, all based on linear probing.

> Abseil

> Rust standard

> hashbrown

These hash tables are not based on plain linear probing, they use something that's essentially quadratic probing done in chunks. Not sure about the others but they might be doing something similar.

attractivechaos•1mo ago
These three and boost are all based on swiss tables. They are indeed more robust than plain linear probing. khashl is the only one here using basic linear probing. Without salting, its curve is through the roof, much worse than swiss tables.
AlotOfReading•4w ago
This easier to solve if you use a permutation instead of hash functions.

Let h0 be the larger table, and h1 the smaller. N = len(h0), M = len(h1). Pretend the elements of the tables are sequentially indexed. Element[0] is h0[0], Element[N] is h1[0], etc.

     h0' = Resize(h0, (N+M)*capacity_factor)

    for x in 0...(range):
      y = permute(x, 0, (N+M)*capacity_factor)
      if(y >= N) move_to(h0'[y], element[x])
One allocation and you move the minimum number of elements needed to eliminate primary clustering. Elements in h0 that aren't moved would presumably remain correctly indexed. You have to move the remaining elements of h1 as well, but that cluttered things.

Any randomish permutation works, from basic ones up to cryptographic. If your permutation only works on certain powers of two, iterate it until the result is in range.