frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

SectorC: A C Compiler in 512 bytes

https://xorvoid.com/sectorc.html
71•valyala•3h ago•14 comments

Brookhaven Lab's RHIC concludes 25-year run with final collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
23•gnufx•2h ago•10 comments

I write games in C (yes, C)

https://jonathanwhiting.com/writing/blog/games_in_c/
119•valyala•3h ago•90 comments

The F Word

http://muratbuffalo.blogspot.com/2026/02/friction.html
27•zdw•3d ago•2 comments

Software factories and the agentic moment

https://factory.strongdm.ai/
81•mellosouls•6h ago•154 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
39•surprisetalk•3h ago•48 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
142•AlexeyBrin•9h ago•26 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
91•vinhnx•6h ago•11 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
848•klaussilveira•23h ago•255 comments

First Proof

https://arxiv.org/abs/2602.05192
62•samasblack•6h ago•50 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
1087•xnx•1d ago•618 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
60•thelok•5h ago•9 comments

Reinforcement Learning from Human Feedback

https://rlhfbook.com/
90•onurkanbkrc•8h ago•5 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
228•jesperordrup•13h ago•80 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
512•theblazehen•3d ago•189 comments

We mourn our craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
317•ColinWright•2h ago•379 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
249•alainrk•8h ago•401 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
25•momciloo•3h ago•4 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
607•nar001•7h ago•266 comments

72M Points of Interest

https://tech.marksblogg.com/overture-places-pois.html
34•marklit•5d ago•6 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
177•1vuio0pswjnm7•10h ago•246 comments

Selection Rather Than Prediction

https://voratiq.com/blog/selection-rather-than-prediction/
11•languid-photic•3d ago•4 comments

A Fresh Look at IBM 3270 Information Display System

https://www.rs-online.com/designspark/a-fresh-look-at-ibm-3270-information-display-system
45•rbanffy•4d ago•9 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
123•videotopia•4d ago•37 comments

History and Timeline of the Proco Rat Pedal (2021)

https://web.archive.org/web/20211030011207/https://thejhsshow.com/articles/history-and-timeline-o...
20•brudgers•5d ago•4 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
28•sandGorgon•2d ago•14 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
90•speckx•4d ago•102 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
208•limoce•4d ago•115 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
283•isitcontent•23h ago•38 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
564•todsacerdoti•1d ago•275 comments
Open in hackernews

A fast, growable array with stable pointers in C

https://danielchasehooper.com/posts/segment_array/
218•ibobev•6mo ago

Comments

zokier•6mo ago
> Today’s computers use only 48 bits of the 64 bits in a pointer

https://en.wikipedia.org/wiki/Intel_5-level_paging introduced in Ice Lake 6 years ago.

But anyways, isn't this just variant of std::deque? https://en.cppreference.com/w/cpp/container/deque.html

cornstalks•6mo ago
What kind of setups use over 256 TiB of RAM?
reorder9695•6mo ago
"640k ought to be enough for anybody"
bonzini•6mo ago
In practice it's over 64 TiB because kernels often use a quarter of the available addressing space (half of the kernel addressing space) to map the physical addresses (e.g. FFFFC000_12345678 maps physical address 0x12345678). So 48 virtual address bits can be used with up to 2^46 bytes of RAM.
hinkley•6mo ago
And how long has 48 bit addressing been de rigeur? Not so long ago we had processors that could address 40 bits of address space. Or was that 38?
winocm•6mo ago
At least since maybe the DEC Alpha 21264. It could address 48-bits of VA space, but that comes with caveats due to PALcode specific intricacies.

I think VMS (or was it Tru64?) uses this mode, but many other OSes just use 43-bit or 40-bit addressing. Realistically though, I don’t think many users would be using workloads that addressed more than 38-bits worth of contiguous VA space in 1998-1999.

loeg•6mo ago
amd64 has had 48-bit addressing / 4-level paging from the beginning.
hinkley•6mo ago
I must be thinking of intel’s failed 64 bit attempts prior to amd64 winning.
wittystick•6mo ago
Physical addresses may be limited to 40 bits, but 48-bit virtual addresses have been the norm since 4-level paging.
bonzini•5mo ago
You're probably thinking of PAE (36 bit physical address space on 32 bit virtual address space). But Intel for quite some time had a 36 bit physical address space limit on 64-bit consumer processors, even if the virtual address space was 48 bits. I think it only changed 5 years ago or less.
TapamN•6mo ago
It not necessarily physical RAM. If you memmap large files, like maybe a large file from RAID or network share, you could still need that much virtual address space.
Dylan16807•6mo ago
Do many programs want to use that much data but not control when it swaps in and out?
wittystick•6mo ago
No, but 5-level paging is opt-in anyway, so its presence isn't problematic if assuming a 48-bit address space. Linux won't allocate space outside the 48-bits unless you give an address hint to mmap outside the 48-bit range.
sestep•6mo ago
Does std::deque support random access?
mwkaufma•6mo ago
Yes, you can see operator[] in the linked reference docs.
ksherlock•6mo ago
The cppreference documentation specifies "Random access - constant O(1)" (same as std::vector). There's a slight overhead from going through 2 pointers instead of 1 and keeping track of how many items are in the first bucket (so you know which bucket to check), but that's a constant and the big O don't care about constants.
mwkaufma•6mo ago
In principle it's not that different that deque, though:

(1) deque uses fixed-sized blocks, not increasing-size blocks. (2) dequeue supports prepending, which adds another level of indirection internally.

sigbottle•6mo ago
You can support prepending by mirroring the allocations, probably? eg for the "negative index" case do an exponential thing in the other direction.

Your indexing has some legitimate math to be done now which can be annoying (efficiency perspective) I think you can still get o(1) with careful allocation of powers of 2.

o11c•6mo ago
That's fine if you only ever add, but is likely to fail if you pop FIFO-style. This too is ultimately fixable but means we can no longer assume "every bucket size doubles".

That said, IMO "stable pointers" is overrated; "minimize copying" is all that's useful.

dwattttt•6mo ago
Stable pointers is a limitation for simplicity's sake. If C were better at tracking pointer invalidation, we'd be better at propagating updated pointers.
sigbottle•6mo ago
I don't know the precise details of how deques are implemented in C++, but given the most popular stack overflow explanation of them, some immidiate pitfalls are that the T* map itself sounds unbounded and if each chunk allocates only a fixed constant size it's probably horrible for fragmentation or overallocation. The indexing also seems dependent on division.

With this power of twos approach you can't really truly delete from the front of the array but the amount of pointers you store is constant and the memory fragmentation is better. (Though OP never claimed to want to support deque behavior, it shouldn't be that hard to modify, though indexing seems like it has to go thru more arithmetic again)

I haven't used OP's array, but I have been bit plenty of times with std::deque's memory allocation patterns and had to rewrite with raw arrays and pointer tracking.

fc417fc802•6mo ago
> The indexing also seems dependent on division.

As long as the number of slots in a bucket is a power of two division reduces to right shift (I have no idea what std::deque does though). One of the advantages of not growing exponentially is that you can use a deque the way you would a ring buffer without it attempting to eat the entire address space.

forrestthewoods•6mo ago
std::deque details vary by implementation and is largely considered unusable for MSVC.

MSVC uses a too small block size making it worthless. libc++ block size is 16 elements or 4096 bytes.

It is generally better to use a container you can actually understand the implementation details and control.

I would not call it a variant of std::deque myself. Not wrong. But not a helpful observation imho.

penguin_booze•6mo ago
What's all that icons on the diagram that illustrates 5-level paging - snowflakes and triangles?
Retr0id•6mo ago
Huh, seems like a bug in the SVG renderer. The SVG itself looks normal: https://upload.wikimedia.org/wikipedia/commons/2/2b/Page_Tab...
penguin_booze•6mo ago
Thanks for that. I was worried I don't understand paging anymore. FWIW, I was using Firefox.
Retr0id•6mo ago
The images you see on wikipedia are rendered to a bitmap, so it's an issue on their end somewhere (and you'll see the same regardless of browser)
nly•6mo ago
Unfortunately std::deque is hobbled in the Microsoft implementation. Its block size is such that if T is larger than 8 bytes it devolves to a linked list.

And it can't be fixed due to binary compatibility.

https://github.com/microsoft/STL/issues/147

By contrast the GNU implementation has a block size of 512 bytes

Fortunately in high performance systems the times where you actually want an unbounded queue are limited.

01HNNWZ0MV43FF•6mo ago
Readers might also find `plf::colony` interesting: https://www.plflib.org/colony.htm
variadix•6mo ago
You can also use virtual memory for a stable resizable vector implementation, up to some max length based on how much you virtual memory you reserve initially, then commit as required to grow the physical capacity.
loeg•6mo ago
Yeah, with less runtime overhead, so long as you're ok with the ~4kB minimum allocation size.
IshKebab•6mo ago
I think optimal would be just normal vector under 4 kB and then switch to whole pages after that.

Do any vector implementations & allocators actually do this though?

loeg•6mo ago
You lose the in-place resize property if you don't start with an entire page.

As far as "has anyone implemented this?" -- I don't know.

IshKebab•6mo ago
Yeah but it probably makes little difference if your vector is under 4 kB anyway.
fyrn_•6mo ago
This mention this alturnative in the article, and also point out how it does not work in embeded contexts or with WASM
ncruces•6mo ago
I've actually used this to implement the linear memory of a Wasm runtime (reserve 4GB and commit as needed) and have faced user complaints (it's in a library that uses a Wasm runtime internally) due to it unexpectedly running into issues, particularly under nested virtualization scenarios.

I've needed to add knobs to configure it, because even a handful of 4GB instances causes issues. I've defaulted to 256MB/instance, and for my own GitHub CI use 32MB/instance to reduce test flakiness.

This is to say: I found the idea that “just reserve address space, it's not an issue of you don't commit it” very flaky in practice, unless you're running on bare metal Linux.

loeg•6mo ago
Historically the Java runtime did something similar, with similar problems (exacerbated by being older hardware with smaller RAM).

Virtual address space just isn't free.

Dylan16807•6mo ago
On the other hand if we're speaking historically then it was probably using a lot more than a hundredth of a percent of the address space. Or a billionth of the address space, depending on how we're counting.

It doesn't need to be free, but your address space use should be able to go a handful of bits above your physical memory without problems.

Even fully allocating all the page entries, with no large pages, on an architecture with 4KB pages, would only take 8MB to track a 4GB allocation.

cmrdporcupine•6mo ago
Like you, I've played with it many times and am always intrigued by this idea of handling sparsity this way, but I just end up backing out these changes in the end because it never seems to pay off like my geeky fantasy plays it out.

The LeanStore and Umbra DB buffer pool management systems intrigued me. I even got paid to work on a system like this (buffer pool at an analytical database company). The KISS data structure (https://dl.acm.org/doi/10.1145/2236584.2236587) and the START adaptive radix tree (https://db.in.tum.de/~fent/papers/Self%20Tuning%20Art.pdf?la...) have also intrigued me. I've been very intrigued and almost always disappointed.

In practice performance gains have never really materialized for me ... I do think there are wins to be had here, but only for specialized cases.

And just watch how confused and annoyed your users are when they see an absolutely huge VSS. Sysadmins freaking out at your "insane memory usage" (which is really only on paper not in reality). And the difficulty of profiling or tracking what's going on easily.

loeg•6mo ago
Embedded environments without virtual memory are increasingly rare, and generally not places where you would need a variably sized generic list ADT anyway.
tovej•6mo ago
Very nice! I do wonder if it would be useful to be able to skip even more smaller segments, maybe a ctor argument for the minimum segment size. Or maybe some housekeeping functions to collapse the smallest segments into one.

Mostly the thing that feels strange is when using say, n > 10 segments, then the smallest segment will be less than a thousandth of the largest, and iterating over the first half will access n-1 or n-2 segments, worse cache behaviour, while iterating over the second half will access 1 or two segments.

Seems like, in most cases, you would want to be able to collapse those earlier segments together.

brunokim•6mo ago
If you do that, you lose those stable pointers.
o11c•6mo ago
Can we really call it an array if it's not contiguous (or at least strided)? Only a small fraction of APIs take an `iovec, iovcnt`-equivalent ...
dhooper•6mo ago
feel free to call it a "levelwise-allocated pile"
jandrese•6mo ago
Yeah, the limitation that it can't be just dumped into anything that expects a C array is a large one. You need to structure your code around the access primitives this project implements.
unwind•6mo ago
Very nice, although I think the level of "trickery" with the macros becomes a bit much. I do understand that is The Way in C (I've written C for 30 years), it's just not something I'd do very often.

Also, from a strictly prose point of view, isn't it strange that the `clz` instruction doesn't actually appear in the 10-instruction disassembly of the indexing function? It feels like it was optimized out by the compiler perhaps due to the index being compile-time known or something, but after the setup and explanation that was a bit jarring to me.

mananaysiempre•6mo ago
The POSIX name for the function is clz() [the C23 name is stdc_leading_zeros(), because that's how the committee names things now, while the GCC intrinsic is __builtin_clz()]. The name of the x86 instruction, on the other hand, is BSR (80386+) or LZCNT (Nehalem+, K10+) depending on what semantics you want for zero inputs (keep in mind that early implementations of BSF/BSR are very slow and take time proportional to the output value). The compiled code uses BSR. (All of these are specified slightly differently, take care if you plan to actually use them.)
unwind•6mo ago
Got it, thanks. I suck at x86, even more than I thought. :/

Edit: it's CLZ on Arm [1], probably what I was looking for.

[1]: https://developer.arm.com/documentation/100069/0610/A32-and-...

mananaysiempre•6mo ago
In that case, I suck at POSIX—I could’ve sworn clz() was a standard or at least a conventional function, but no, that’s fls(), which is furthermore not universal across Unices. Either way, if you don’t feel your knowledge of the x86 instruction set is adequate, there’s always an option of taking an instruction listing and looking up anything that seems unfamilliar. It’s surprisingly effective. (You can either use an old listing[1] or skip all the vector stuff in a new one[2].)

[1] https://web.archive.org/web/20000819070651/http://www.quanta...

[2] https://www.felixcloutier.com/x86/

kilpikaarna•6mo ago
> The Way in C

Is it though? (Ab)using C macros so you can write obviously-not-C stuff like (from the example):

SegmentArray(Entity) entities = {0};

Seeing that kind of thing in example C code just makes my hair stand on end because you know it's someone who actually wants to write C++ but for whatever reason has decided to try to implement their thing in C and be clever about it. And I'm going to have to go parse through multiple levels of macro indirection to just understand what the hell is going on.

Seems like a useful data structure, despite the shortcoming that it can't be accessed like a regular array. Normally auto-expanding arrays involves realloc which is tricky with arena allocation. But jeez, just pass void pointers + size and have it assert if there's a mismatch.

exDM69•6mo ago
> Also, from a strictly prose point of view, isn't it strange that the `clz` instruction

It's using the `bsr` instruction which is similar (but worse). The `lzcnt` instruction in x86_64 is a part of the BMI feature introduced in Intel Haswell. The compiler does not generate these instructions by default so it runs on any x86_64.

If you add `-mbmi` or `-march=haswell` or newer to the compiler command line, you should get `clz`/`lzcnt` instead.

jovial_cavalier•6mo ago
The example code doesn't seem to compile.
seanwessmith•6mo ago
Make sure the create the segment_array.h file. Mine outputted just fine on Mac M4

  Desktop gcc main.c        
  Desktop ./a.out 
entities[0].a = 1 entities[0].a = 1 entities[1].a = 2
jovial_cavalier•6mo ago
Yes, I made the header. I'm on x86-64.

https://gist.github.com/fpf3/71c72e224e1c82d9a5d37be621def42...

The errors make sense. You can't put a comma-separated initializer into a macro... What even is the symbol `entity`? It's not even clear to me what is meant by that, he doesn't define it anywhere.

edit: Looks like he updated his header since I first tried to compile. Now it works fine but the header looks significantly more complicated.

pfg_•6mo ago
Zig has this as std.SegmentedList, but it can resize the segment array dynamically
andrewla•6mo ago
This is really clever but better to call this a list rather than an array; functions which expect array semantics will simply not work, and there's no way to transparently pass slices of this data structure around.

In the past I've abused virtual memory systems to block off a bunch of pages after my array. This lets you use an array data structure, have guard pages to prevent out of bounds access, and to have stable pointers in the data structure.

hinkley•6mo ago
I believe I've seen problems like this also solved with arena allocators. You have certain very special allocations have an arena unto themselves.
stmw•6mo ago
Same re: virtual memory systems (using guard pages), that is an old idea that works well but it did once produce a really unpleasant bug in production... But that was an unfortunate implementation mishap.
benlwalker•6mo ago
For an expanding array in a 64 bit address space, reserving a big region and mmaping it in as you go is usually the top performing solution by a wide margin. At least on Linux, it is faster to speculatively mmap ahead with MAP_POPULATE rather than relying on page faults, too.

And, if you find you didn't reserve enough address space, Linux has mremap() which can grow the reserved region. Or map the region to two places at once (the original place and a new, larger place).

OptionOfT•6mo ago
One place I had issues was rapidly allocating space I needed temporarily but then discarding it.

The space I needed was too large to be added to the heap, so I used mmap. Because of the nature of the processing (mmap, process, yeet mmap) I put the system under a lot of pressure. Maintaining the set of mapped blocks and reusing them around fixed the issue.

vlovich123•6mo ago
Freeing memory to the OS or doing things like munmap involves a TLB shootdown which by definition is a performance bottleneck. Probably what you ended up experiencing.
OptionOfT•6mo ago
Thanks for that. That was an interesting read!

I ended up in this situation like I end up in most, by just learning as I go.

With this I learned that the libc allocator has a max amount that it'll add to the heap, and how mmap works, and that doing frequent munmap is slow.

Now I learned why it is slow!

vlovich123•6mo ago
FWIW mremap is very expensive as it involves a TLB shootdown.
zoogeny•6mo ago
I think the article buries a significant drawback: contiguity. It is obviously implied by the design but I think this approach would have hard-to-define characteristics for things like cache prefetching. The next address is a function, not an easily predictable change.

One frequent reason to use an array is to iterate the items. In those cases, non-contiguous memory layout is not ideal.

forrestthewoods•6mo ago
I would not call it “non-contiguous”. It’s more like “mostly contiguous”. Which for large amounts data is “amortized contiguous” just like a regular vector has “amortized constant” time to add an element.
dhooper•6mo ago
1. The image at the top of the article makes it clear the segments aren't contiguous

2. iterating a 4 billion item segmented array would have 26 cache misses. Not a big deal.

teo_zero•6mo ago
I think the parent poster meant that a compiler might have a hard time understanding when sa_get(..., i) and sa_get(..., i+1) actually access contiguous memory locations, and will thus stop applying nice optimizations. Conversely, accessing a[i] for all 4 billion items of a regular array will be optimized to specialized instructions, not excluding SIMD or SWAR.
KapKap66•6mo ago
If I understand the article right, if this is an issue I think you can get around it by redesigning your approach to first retrieve the segment and segment length directly and then access the data within the segment like a traditional array, instead of going through your accessor functions every time. Should help with the problem a bit.
taminka•6mo ago
i love a nice single header project, it's c23 only tho (bc of typeof)?
listeria•6mo ago
They link to an old article [1] that was featured in HN [2] somewhat recently, in which there's a workaround for older standards with regards to typeof.

[1]: https://danielchasehooper.com/posts/typechecked-generic-c-da...

[2]: https://news.ycombinator.com/item?id=44425461

johnisgood•6mo ago
It is a GNU extension, supported by both GCC and Clang.
gotoeleven•6mo ago
Under what conditions is exponential segment sizing preferable to fixed size segments? Are there any specific algorithms or situations where this is especially good? It seems like the likelihood of large amounts of wasted space is a major downside.
o11c•6mo ago
It's always better - the increase in indexing complexity is negligible, and it completely eliminates resizing of the top-level array. It also reduces the number of calls to `malloc` while keeping capacity proportional to active size.
listeria•6mo ago
They mention using this as the backing array for a power-of-two-sized hash table, but I don't think it would be very useful considering that the hash table won't provide stable pointers, given that you would need to rehash every element as the table grows. Even if you just wanted to reuse the memory, rehashing in-place would be a PITA.
conradludgate•6mo ago
I think they mentioned it's for an arena, where stability is necessary. You might happen to use said arena for a hash table
Lichtso•6mo ago
Another similar data structure which has a balanced tree (instead of a list) that references array segments is the https://en.wikipedia.org/wiki/Rope_(data_structure)

Its main advantages are the O(log n) time complexity for all size changes at any index, meaning you can efficiently insert and delete anywhere, and it is easy to implement copy-on-write version control on top of it.

monkeyelite•6mo ago
A lot of standard libraries have tried to implement ropes for things like strings and it usually is reverted for a simpler structure.
josephg•6mo ago
There's a good reason for that. Almost all strings ever created in programs are either very small, immutable or append-only. Eg, text labels in a user interface, body of a downloaded HTTP request or a templated HTML string, respectively. For these use cases, small string optimisations and resizable vecs are better choices. They're simpler and faster for the operations you actually care about.

The only time I've ever wanted ropes is in text editing - either in an editor or in a CRDT library. They're a good choice for text editing because they let users type anywhere in a document. But that comes at a cost: Rope implementations are very complex (skip lists have similar complexity to a b-tree) and they can be quite memory inefficient too, depending on how they're implemented. They're a bad choice for small strings, immutable strings and append only strings - which as I said, are the most common string types.

Ropes are amazing when you need them. But they don't improve the performance of the average string, or the average program.

monkeyelite•6mo ago
Yes. But also overwhelming consensus is that complex indirect data structures just don’t end up performing on modern hardware due to cache and branch prediction.

Only use them when the theoretical algorithmic properties make them the only tool for the job.

OskarS•6mo ago
They have their place. Certainly B-tree data-structures are tremendously useful and usually reasonably cache friendly. And if std::deque weren't busted on MSVC, there are times where it would be very useful. Linked lists have their place as well; a classic example would be an LRU cache, which is usually implemented as a hash table interleaved with a doubly linked list.

But yeah. Contiguous dynamic arrays and hash tables, those are usually what you want.

josephg•6mo ago
Thats not at all true though?

If you have a small dataset, yeah, memcpy will outperform a lot of indirect pointer lookups. But that doesn't stay true once you're memcpying around megabytes of data. The trick with indirect data structures on modern hardware is to tune the size of internal nodes and leaf nodes to make the cache misses worth it. For example, Binary trees are insanely inefficient on modern hardware because the internal nodes have a size of 2. If you give them a size of 64 or something, they perform much better. (Ie, make a b-tree). Likewise, a lot of bad tree implementations put just a single item in the leaf nodes. Its much better to have leaves store blocks of 128 items or something. And use memcpy to move data around within the block when needed.

This gets you the best of both worlds.

I spent about 18 months optimising a text based CRDT library (diamond types). We published a paper on it. By default, we store the editing history on disk. When you open a document, we reconstruct the document from scratch from a series of edits. After awhile, actually applying the stream of edits to a text document became the largest performance cost. Ropes were hugely useful. There's a stack of optimisations we made there to make that replay another 10x faster or so on top of most rope implementations. Using a linear data structure? Forget it. For nontrivial workloads, you 100% want indirect data structures. But you've gotta tune them for modern hardware.

monkeyelite•6mo ago
My comment is an observation about how this gets tried every few years in major libraries and is usually reverted. I Agree, there are use cases where these are better. But the pattern tends to be to revert to simpler data structures.
leecommamichael•6mo ago
I’d rather do it in Odin, and I do.
willtemperley•6mo ago
Looks a bit like rust-array-stump [1] which was built to optimise insertions in the middle of an array in computational geometry.

[1] https://github.com/bluenote10/rust-array-stump

kazinator•6mo ago
> In other words [because the access sequence is just 10 instructions], memory will be the bottleneck, not the instructions to calculate where an index is.

Ha, that is wishful thinking. If you do this in a tight loop in which everything is in the L1 cache, the instructions hurt!

"Memory bandwidth is the bottleneck" reasoning applies when you access bulk data without localized repetition.

HelloNurse•6mo ago
Those 10 instructions are for one access, not for a tight loop. A tight loop could be done with a much more complex macro that iterates separately in each segment, amortizing the overhead.
tobyhinloopen•6mo ago
This is amazing, why did I never come up with this
mgaunard•6mo ago
It's called a deque, part of C++'s standard library