frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
1•beardyw•1m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•1m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
1•surprisetalk•4m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
1•surprisetalk•4m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
1•surprisetalk•4m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...
1•pseudolus•4m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/
1•1vuio0pswjnm7•5m ago•0 comments

Bogus Pipeline

https://en.wikipedia.org/wiki/Bogus_pipeline
1•doener•6m ago•0 comments

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

https://nypost.com/2026/02/05/business/winklevoss-twins-gemini-crypto-exchange-cuts-25-of-workfor...
1•1vuio0pswjnm7•6m ago•0 comments

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
2•obscurette•6m ago•0 comments

Cycling in France

https://www.sheldonbrown.com/org/france-sheldon.html
1•jackhalford•8m ago•0 comments

Ask HN: What breaks in cross-border healthcare coordination?

1•abhay1633•8m ago•0 comments

Show HN: Simple – a bytecode VM and language stack I built with AI

https://github.com/JJLDonley/Simple
1•tangjiehao•11m ago•0 comments

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

https://caratria.com/
1•jonrosner•12m ago•1 comments

My Eighth Year as a Bootstrapped Founde

https://mtlynch.io/bootstrapped-founder-year-8/
1•mtlynch•12m ago•0 comments

Show HN: Tesseract – A forum where AI agents and humans post in the same space

https://tesseract-thread.vercel.app/
1•agliolioyyami•12m ago•0 comments

Show HN: Vibe Colors – Instantly visualize color palettes on UI layouts

https://vibecolors.life/
1•tusharnaik•13m ago•0 comments

OpenAI is Broke ... and so is everyone else [video][10M]

https://www.youtube.com/watch?v=Y3N9qlPZBc0
2•Bender•14m ago•0 comments

We interfaced single-threaded C++ with multi-threaded Rust

https://antithesis.com/blog/2026/rust_cpp/
1•lukastyrychtr•15m ago•0 comments

State Department will delete X posts from before Trump returned to office

https://text.npr.org/nx-s1-5704785
6•derriz•15m ago•1 comments

AI Skills Marketplace

https://skly.ai
1•briannezhad•15m ago•1 comments

Show HN: A fast TUI for managing Azure Key Vault secrets written in Rust

https://github.com/jkoessle/akv-tui-rs
1•jkoessle•16m ago•0 comments

eInk UI Components in CSS

https://eink-components.dev/
1•edent•16m ago•0 comments

Discuss – Do AI agents deserve all the hype they are getting?

2•MicroWagie•19m ago•0 comments

ChatGPT is changing how we ask stupid questions

https://www.washingtonpost.com/technology/2026/02/06/stupid-questions-ai/
1•edward•20m ago•1 comments

Zig Package Manager Enhancements

https://ziglang.org/devlog/2026/#2026-02-06
3•jackhalford•22m ago•1 comments

Neutron Scans Reveal Hidden Water in Martian Meteorite

https://www.universetoday.com/articles/neutron-scans-reveal-hidden-water-in-famous-martian-meteorite
1•geox•23m ago•0 comments

Deepfaking Orson Welles's Mangled Masterpiece

https://www.newyorker.com/magazine/2026/02/09/deepfaking-orson-welless-mangled-masterpiece
1•fortran77•24m ago•1 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
3•nar001•26m ago•2 comments

SpaceX Delays Mars Plans to Focus on Moon

https://www.wsj.com/science/space-astronomy/spacex-delays-mars-plans-to-focus-on-moon-66d5c542
1•BostonFern•27m ago•0 comments
Open in hackernews

Cache-Friendly B+Tree Nodes with Dynamic Fanout

https://jacobsherin.com/posts/2025-08-18-bplustree-struct-hack/
99•jasim•4mo ago

Comments

apelapan•4mo ago
Many mentions of things being too slow and other things being high performance. I'm not doubting the truthfulness, but it would have been really nice to see some hard numbers that show the magnitude of improvement in a scenario or two.
whizzter•4mo ago
Yeah, this is some weird "performance" optimizer that half-misuses terminology and complains that using an underlying container is bad for implementing their own basic container.

Eh yes, you're implementing your basic container, naturally a basic container won't cut it.

aidenn0•4mo ago
Also, it's very hard to make a b+tree that is ever faster than other data-structures in RAM.

Obviously if you don't need (or only rarely need) in-order traversing (or related operations like successor), hash-tables are very fast.

If you do need in-order traversing, for small amounts of data, sorted arrays are very fast, and for large amounts of data various types of prefix-tries do very well.

apelapan•4mo ago
If you ever write a blog post about these things, I'd love to see some measurements! :-)
aidenn0•4mo ago
Yeah, I really should put something out there. It may be that my b+trees are suboptimal, and some sunlight on it would help.
apelapan•4mo ago
Measurements of actual implementations make for fruitful, constructive discussion. Doesn't need to be perfect to be useful and interesting. Hope to see it on the front page!
thesz•4mo ago
One would be better off implementing cache-oblivious lookahead array [1] or even log-structured merge trees.

[1] https://www3.cs.stonybrook.edu/~bender/newpub/BenderFaFi07.p...

Both structures exploit the fact that most of the data does not change much and can be packed as tight as one wishes. Even prefixes (and suffixes) can be factored out.

pluto_modadic•4mo ago
LSM is great for write heavy loads. not sure about random reads, isn't that a B+ tree's turf?
thesz•4mo ago
B+-tree is an optimization for slightly faster sequential read, when leafs are organized into a list.

LSM allow for very compact data representation because most levels of the tree do not change much through time, and this is what we are talking about here. This compact representation makes LSM trees faster at sequential reads too (less pointer chasing).

Also, you can differently construct LSM trees for different operations: higher LSM tree with more narrow levels allows for faster writes, flatter LSM tree with less but wider levels allow for faster reads.

moab•4mo ago
Do you know of important real-world use-cases where cache-oblivious data structures are used? They are frequently mentioned on HN when relevant discussions like this one pop up, but I would love to hear about places where they are actually used in production.
thesz•4mo ago
You may not encounter a situation where cache oblivious structure is needed, but the knowledge that some algorithms are cache oblivious can help.

For example, if you represent a set of values as an ordered array, you can perform many set operations with the (array) merge algorithm, which is cache oblivious one. You can avoid pointer chasing, etc, but your structure becomes static instead of dynamic as with, say, red-black trees. This is already a win because you can save memory and compute if at least one of your sets does not change much.

But, if you apply logarithmic method [1] you can have dynamic data structure (over static one, a sorted array) and still perform most of operations without pointer chasing and with cache oblivious merging.

[1] https://www.scilit.com/publications/4dd8074c2d05ecc9ed96b5cf...

whizzter•4mo ago
This article doesn't seem to be for any disk based structure but rather in-memory, in an in-memory scenario with order requirements some users have reported B-tree's as being quite competitive in mostly-read scenarios.

"Write-heavy" scenarios will probably be just fine with std::map (backed by an RB-tree) since the main downside of B-tree's is write amplification that isn't an issue since memory doesn't really have any block granularity.

LSM tree's in-memory will probably not be that useful as scanning becomes much more complicates (it's an interesting structure though if you have append-only workloads and want to implement dictionary for size-coded projects on top of a list).

thesz•4mo ago
We are talking about cache friendliness here, including compact data representation.

B-trees algorithm requires leaf nodes to be filled up to some prespecified fill factor, usually from 50% to 70%. This is not compact by any measures.

Both LSM trees and COLA allow for much more compact storage. They also pretty cache friendly in the "cache oblivious" sense.

whizzter•4mo ago
Read up more on the COLA paper and funnily enough inspired by LSM-tree's I implemented something alike "basic-cola" a few months back (the thing I mentioned for size-coding), add to end of array, hierarchically re-sort for each of the bits of the new size that turns 0 from bottom up (divisible by 2, sort last 2, divisible by 4, sort last 4 and so on) and search with O((log2(n))^2) complexity.

B-tree fill factors are a parameter that can be tuned to avoid extra splits depending on your insertion patterns, an in-memory variant doesn't need to be tuned like a disk-based variation.

Also nothing preventing the "bottom" of an LSM variation to be a dense B-tree like structure that just gets less updates.

The COLA paper also never mentions threading, an in-memory B-tree should scale better with size if there is multi-threading while I don't see an easy way to avoid big locks with COLA, maybe why the COLA paper was from 2000 and we haven't seen much additional development on it? (LSM-tree's work beautifully of course but then you basically have the double-space issue like with semi-space garbage collectors).

In the end, what you choose is dependent on your application and patterns, COLA is a clever structure that probably shines in some scenarios.

Like my scenario was a mostly heavy on consecutive insertions as well as lookups of potentially arbitrary compound keys, perfect for cola like since those are cheap.

thesz•4mo ago

  > while I don't see an easy way to avoid big locks with COLA
Do not modify arrays in-place, create new arrays instead. This way you can have multiple readers as the data pretty much read-only, no write locks and, again, cache-oblivious merging (fast).

Extension of COLA called Fractal Indexes (on-disk storage) are commercialized by Tokutek: https://en.wikipedia.org/wiki/Fractal_tree_index

namibj•4mo ago
LSMs are also useful for coping with (i.e., cleaning up the messes/consolidating no-longer-relevant historic events into just their final state) streaming/windowed multi-temporality as happens from lifting iterative/fixpoint computations from batch semantics to streaming/incremental updates.

You have to consolidate when the time has come to reclaim space and to avoid needless repeat compute during accesses. Might as well use it to run full LSM tactics. Especially when keeping in mind that array mapped trees have very simple index arithmetic once you treat them as semantically literally identical to a sorted (SoA) array with a cache-benefitting address/index scrambler.

kazinator•4mo ago
> This pattern was officially standardized in C99,

No it wasn't; the C99 flexible array uses [] not [1] or [0].

When using the [1] hack, you cannot use the sizeof the structure to get the offset, because it includes the [1] array.

When using C99, you also cannot use sizeof to get the offset of the [0] element of the flexible array; sizeof is allowed to pretend that there is padding as if the flexible member were not there.

>

  // The (N - 1) adjusts for the 1-element array in Payload struct
  Payload *item = malloc(sizeof(Payload) + (N - 1) * sizeof(char))
>

If you are in C++ you need a cast; the void * return value of malloc cannot implicitly convert to Payload *.

  Payload *item = static_cast<Payload *>(malloc(...));
Or of course a C cast if the code has to compile as C or C++:

  Payload *item = (Payload *) malloc(...);
Setting aside that issue for brevity, pretending we are in C, I would make the malloc expression:

  Payload *item = malloc(offsetof(Payload, elements) + N);
sizeof(char) is by definition 1, so we do not need to multiply N by it.

By taking the offset of the elements array, we don't need to subtract 1 from N to account for the [1] element being skipped by sizeof.

These kinds of little things take away complexity for something that must be carefully coded to avoid a memory safety issue. You really want the calculations around the memory to use the simplest possible formulas that are as easy as possible to reason about to convince yourself they are correct.

Also, when you do use sizeof in a malloc expression, the following pattern avoids repeating the type name for the size, and also lets a pair of parentheses be dropped since sizeof only requires parentheses when the operand is a type:

  Payload *item = malloc(sizeof *item);
halayli•4mo ago
Strictly speaking, in the C++ object model, malloc allocates storage but doesn't create objects. Accessing that memory as if it contains an object (even a trivial one like int) without properly giving it object lifetime is technically UB. For trivial types, this is rarely enforced in practice, but the standard says to use placement new or std::start_lifetime_as (C++23) to properly begin object lifetime.
dataflow•4mo ago
> Strictly speaking, in the C++ object model, malloc allocates storage but doesn't create objects.

No - strictly speaking, it does create objects. https://en.cppreference.com/w/cpp/memory/c/malloc.html#:~:te...

It gets confusing (to say the least) if you start questioning the details, but the spec does formally intend the objects to be implicitly created.

jagged-chisel•4mo ago
I’m not a C++ dev … Does that mean calling constructors? So a default, parameter-less constructor must exist for the given type, and it will be called N times - right?
dataflow•4mo ago
It's only legal for types that are sufficiently trivial, so the "called constructor" would be trivial. You'll want to follow the links in the page I sent you, it's explained: https://en.cppreference.com/w/cpp/language/classes.html#Impl...
whizzter•4mo ago
Even better and simpler..

  Payload *item = (Payload *)malloc(offsetof(Payload, elements[N]));
The rest of the article does make me vary of a lot of other things that aren't done "per-spec" if you're making your own container and probably will cause unintended bugs in the future.
kazinator•4mo ago
That's really great; it's easy to forget that the second argument of offsetof isn't simply a member name, but a designator expression.
kazinator•4mo ago
Unfortunately, ISO C requires offsetof to calculate a constant; moreover, the right operand is a "member designator", which elements[N] isn't.

However, the traditional (and pretty much the only sensible) ways of defining offsetof do make it work.

N3220 draft:

  offsetof(type, member-designator)
[expands] to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the subobject (designated by member-designator), from the beginning of any object of type type. The type and member designator shall be such that given

  static type t;
then the expression &(t. member-designator) evaluates to an address constant. If the specified type name contains a comma not between matching parentheses or if the specified member is a bit-field, the behavior is undefined.

This doesn't mean I won't use it, but just something to be aware of.

It might not be a bad idea to propose to ISO C that offsetof(type, array[n]) should be required to work, and for non-constant n too.

kazinator•4mo ago
In GCC, __builtint_offsetof explicitly supports the extended syntax for the member designator. when the offsetof macro uses __builtin_offsetof, the capability is a documented extension.
pluto_modadic•4mo ago
is there an implementation of B+ trees that fluidly pulls from disk vs RAM?

e.g., two B+ trees, one in RAM and one on disk, with the RAM one evicted with sieve caching? possibly a very lite WAL?

something that lets you use a B+ tree bigger than RAM, and persist to disk

aidenn0•4mo ago
That's a type of Log-structured merge-tree.
gxt•4mo ago
You can implement a BTree with nodes stored in file-backed memmaps. It's plenty fast for the usual business case.
yencabulator•3mo ago
Pretty much every OLTP database contains an implementation of that.

https://www.youtube.com/playlist?list=PLSE8ODhjZXjYMAgsGH-Gt...

aidenn0•4mo ago
This never explains why dynamic fanout is desired. Static fanout is, of course, trivial with templates.