I've been writing ring buffers wrong all these years - https://news.ycombinator.com/item?id=13175832 - Dec 2016 (167 comments)
It is, but, IMO, shouldn’t use the code for “a n-element ring buffer, with n set to 1”, similarly to how an array of booleans in many languages shouldn’t be implemented as “an arrayof Foos, with Foo set to bool”.
C++ has std::bitset and std::vector and Java similarly has BitSet and Array because using the generic code for arrays of bits is too wasteful.
Similarly, a one-element ring buffer is either full or it is empty. Why use two indexes to encode a single boolean?
Notably, this is not the case. C++ std::vector is specialised for bools to pack bits into words, causing an untold array (heh) of headaches.
And "wasteful" is doing a lot of lifting here. In terms of memory usage? Yes. In terms of CPU? The other way around.
That depends on your architecture and access pattern. In case of sequential access, packed bools may perform better due to arithmetic being usually way cheaper than memory operations.
Depending on the element width, you'd have space for different amounts of data in the inline buffer. Sometimes 1, sometimes a few more. Specializing for a one-element inline buffer would be quite complex with limited gains.
In retrospect trying to use that as a running gag for the blog post did not work well without actually giving the full context, but the full context would have been a distraction.
Rather infamously, C++ tried to be clever here and std::vector<bool> is not just a vector-of-bools but instead a totally different vector-ish type that lacks many of the important properties of every other instantiation of std::vector. Yes, a lot of the time you want the space efficiency of a dynamic bitset, rather than wasting an extra 7 bits per element. But also quite often you do want the behavior of a "real" std::vector for true/false values, and then you have to work around it manually (usually via std::vector<uint8_t> or similar) to get the expected behavior.
The author says that non-power-of-two is not possible, but I'm pretty sure it is if you use a conditional instead of integer modulus.
I first learnt of this technique from Phil Burk, we've been using it in PortAudio forever. The technique is also widely known in FPGA/hardware circles, see:
"Simulation and Synthesis Techniques for Asynchronous FIFO Design", Clifford E. Cummings, Sunburst Design, Inc.
https://twins.ee.nctu.edu.tw/courses/ip_core_04/resource_pdf...
I first encountered this structure at a summer internship at a company making data switches.
Intel is still 64 byte cache lines as they have been for quite a long time but they also do some shenanigans on the bus where they try to fetch two lines when you ask for one. So there’s ostensibly some benefit of aligning data particularly on linear scans to 128 byte alignment for cold cache access.
It feels like 90% swe jobs these days are about writing CRUD wrappers.
codeworse•2d ago
mrcode007•1h ago
There is one more mechanism that allows implementing ring buffers without having to compare head and tail buffers at all (and doesn’t rely on counters or empty/full flags etc) that piggybacks on the cache consistency protocol
spockz•1h ago
dooglius•47m ago
wat10000•45m ago
In this sense, the hardware locks used for atomic instructions don't really count, because they're implemented such that they can only be held for a brief, well defined time. There's no equivalent to suspending a thread while it holds a lock, causing all other threads to wait for an arbitrary amount of time.