Should really give the big-O notation for common operations for comparison of the approaches.
---
There are some variations of Chunk/Bucket arrays which offer different characteristics.
HAT (Hashed Array Trees[1]) are close to what is presented - the index block is always a power of 2 and points to data blocks of the same length - ie, it increases as: 1x1, 2x2, 4x4, 8x8, 16x16, ...
The downside is each time the length reaches a new power of 4 it requires the data to be shuffled about, which may cause latency spikes.
RAOTS (Resizeable Arrays in Optimal Time and Space[2]) is a variation where the index block block is approximately the square root of the length. It increases by adding a new block and resizing only the index block. The old blocks remain part of the structure and don't need reshuffling like HAT, which avoids the latency spikes, but there are more individual allocator calls. However, blocks are always a power of 2 which makes them arena-friendly.
Both of these have O(1) element access and waste only O(√n) space, which makes them quite optimal for uses where memory may be a constraint.
---
In the double-and-copy style array, one trick is we can avoid storing `capacity` and compute it on demand - determine the MSB of the length (using `lzcnt` or `bsr`), and then determine the next power of 2 sufficient to hold that length. In C23, this is `stdc_bit_ceil` (<stdbit.h>).
In the "Xal" structure (which has several other names), we can use `stdc_first_leading_one` to determine the index in the index block, then zero out the leading one bit to determine the index in the bucket.
This structure would be more efficient if processors provided a `log2mask` instruction which computes both the MSB and the remainder after masking the MSB in one instruction. (Similar to how `divmod` instructions work where the quotient and remainder are both computed in one instruction).
sparkie•1mo ago
---
There are some variations of Chunk/Bucket arrays which offer different characteristics.
HAT (Hashed Array Trees[1]) are close to what is presented - the index block is always a power of 2 and points to data blocks of the same length - ie, it increases as: 1x1, 2x2, 4x4, 8x8, 16x16, ...
The downside is each time the length reaches a new power of 4 it requires the data to be shuffled about, which may cause latency spikes.
RAOTS (Resizeable Arrays in Optimal Time and Space[2]) is a variation where the index block block is approximately the square root of the length. It increases by adding a new block and resizing only the index block. The old blocks remain part of the structure and don't need reshuffling like HAT, which avoids the latency spikes, but there are more individual allocator calls. However, blocks are always a power of 2 which makes them arena-friendly.
Both of these have O(1) element access and waste only O(√n) space, which makes them quite optimal for uses where memory may be a constraint.
---
In the double-and-copy style array, one trick is we can avoid storing `capacity` and compute it on demand - determine the MSB of the length (using `lzcnt` or `bsr`), and then determine the next power of 2 sufficient to hold that length. In C23, this is `stdc_bit_ceil` (<stdbit.h>).
In the "Xal" structure (which has several other names), we can use `stdc_first_leading_one` to determine the index in the index block, then zero out the leading one bit to determine the index in the bucket.
This structure would be more efficient if processors provided a `log2mask` instruction which computes both the MSB and the remainder after masking the MSB in one instruction. (Similar to how `divmod` instructions work where the quotient and remainder are both computed in one instruction).
---
[1]: https://en.wikipedia.org/wiki/Hashed_array_tree
[2]:https://cs.uwaterloo.ca/research/tr/1999/09/CS-99-09.pdf